[fpc-devel] Patch, font rendering on Arm-Linux devices.
Daniël Mantione
daniel.mantione at freepascal.org
Thu Feb 28 12:36:06 CET 2008
Op Thu, 28 Feb 2008, schreef Yury Sidorov:
>> Yes, but if you have an array of them (as we have in this case),
>> considerably more of these records will fit in the cache. Therefore you
>> will have considerably less cache misses. This becomes even more serious
>> when the processor in question does not have prefetching; in such case,
>> traversing the array will cause cache miss after cache miss, a smaller
>> array will then have less of these misses.
>
> You are right. Array of packed records is a bit more effective than array of
> non-packed records, at least on modern x86 CPUs.
>
> I do some benchmarks and got on Core Duo:
> 2070ms - for non-packed
> 1910ms - for packed
>
> But for CPUs which do not support misaligned data access - packed records are
> speed killers and need to be used as the last resort.
I not 100% sure about this. Your Core Duo has a array traverse detector
which activates prefetching. An ARM does not have such logic and will
suffer cache miss after cache miss.
However, it is for certain that a manual unaligned load is more expensive
on ARM than a hardware unaligned load on x86.
> Also if record is not element of large array it is better do declare it as
> non-packed for all CPUs.
Yes.
Daniël
More information about the fpc-devel
mailing list