[fpc-devel] Patch, font rendering on Arm-Linux devices.

Fri Feb 29 17:05:39 CET 2008

From: "Daniël Mantione" <daniel.mantione at freepascal.org>
>>> Instead "unaligned" will simulate an unaligned load with two loads 
>>> and some
>>> rotation etc. On the ARM, where every mnemonic can rotate 
>>> operands, this is
>>> isn't that bad of a penalty.
>>>
>>> Therefore, I wouldn't be surprised that even on ARM, arrays with 
>>> packed
>>> structures are faster than arrays with unpacked structures.
>>
>> That's possible. Why would it be faster, btw? Better cache 
>> coherency?
>
>Like I mentioned, unliek modern x86 processors, ARM processors cannot
>detect an array traversal and preload the array into the cache. If 
>the
>array is not in cache, you get cache miss after cache miss.
>
>A cache miss is very expensive with latencies of modern memory. A 
>smaller
>array results in less cache misses.

I run my benchmark on ARM mobile and got the following results:
2080ms - for non-packed
4450ms - for packed

It clearly shows that ualigned access kills performance on ARM...

Yury.