[fpc-devel] Patch, font rendering on Arm-Linux devices.

Sun Mar 2 18:06:09 CET 2008

Daniël Mantione schrieb:
> 
> 
> Op Fri, 29 Feb 2008, schreef Christian Iversen:
> 
>> Daniël Mantione wrote:
>>>
>>>
>>> Op Fri, 29 Feb 2008, schreef Christian Iversen:
>>>
>>>>> Instead "unaligned" will simulate an unaligned load with two loads
>>>>> and some rotation etc. On the ARM, where every mnemonic can rotate
>>>>> operands, this is isn't that bad of a penalty.
>>>>>
>>>>> Therefore, I wouldn't be surprised that even on ARM, arrays with
>>>>> packed structures are faster than arrays with unpacked structures.
>>>>
>>>> That's possible. Why would it be faster, btw? Better cache coherency?
>>>
>>> Like I mentioned, unliek modern x86 processors, ARM processors cannot
>>> detect an array traversal and preload the array into the cache. If
>>> the array is not in cache, you get cache miss after cache miss.
>>
>> Unlike modern x86 processors?
>>
>> Granted, I haven't timed it, but most processors since early P4 models
>> are supposed to have "Streaming access detection", which is a fancy
>> way of saying array detection.
>>
>> Are you sure your information is current?
> 
> Please read again. I said modern X86 processors have, ARM processors
> don't have.

And that's why we've the prefetch inline procedure and also a reason why
our move is ~10x times faster than gcc's :)