[fpc-devel] Patch, font rendering on Arm-Linux devices.
Yury Sidorov
jura at cp-lab.com
Fri Feb 29 17:05:39 CET 2008
From: "Daniƫl Mantione" <daniel.mantione at freepascal.org>
>>> Instead "unaligned" will simulate an unaligned load with two loads
>>> and some
>>> rotation etc. On the ARM, where every mnemonic can rotate
>>> operands, this is
>>> isn't that bad of a penalty.
>>>
>>> Therefore, I wouldn't be surprised that even on ARM, arrays with
>>> packed
>>> structures are faster than arrays with unpacked structures.
>>
>> That's possible. Why would it be faster, btw? Better cache
>> coherency?
>
>Like I mentioned, unliek modern x86 processors, ARM processors cannot
>detect an array traversal and preload the array into the cache. If
>the
>array is not in cache, you get cache miss after cache miss.
>
>A cache miss is very expensive with latencies of modern memory. A
>smaller
>array results in less cache misses.
I run my benchmark on ARM mobile and got the following results:
2080ms - for non-packed
4450ms - for packed
It clearly shows that ualigned access kills performance on ARM...
Yury.
More information about the fpc-devel
mailing list