[fpc-devel] Patch, font rendering on Arm-Linux devices.

Thu Feb 28 12:36:06 CET 2008

Op Thu, 28 Feb 2008, schreef Yury Sidorov:

>> Yes, but if you have an array of them (as we have in this case),
>> considerably more of these records will fit in the cache. Therefore you
>> will have considerably less cache misses. This becomes even more serious
>> when the processor in question does not have prefetching; in such case,
>> traversing the array will cause cache miss after cache miss, a smaller
>> array will then have less of these misses.
>
> You are right. Array of packed records is a bit more effective than array of 
> non-packed records, at least on modern x86 CPUs.
>
> I do some benchmarks and got on Core Duo:
> 2070ms - for non-packed
> 1910ms - for packed
>
> But for CPUs which do not support misaligned data access - packed records are 
> speed killers and need to be used as the last resort.

I not 100% sure about this. Your Core Duo has a array traverse detector 
which activates prefetching. An ARM does not have such logic and will 
suffer cache miss after cache miss.

However, it is for certain that a manual unaligned load is more expensive 
on ARM than a hardware unaligned load on x86.

> Also if record is not element of large array it is better do declare it as 
> non-packed for all CPUs.

Yes.

Daniël