[fpc-devel] Patch, font rendering on Arm-Linux devices.
Christian Iversen
chrivers at iversen-net.dk
Fri Feb 29 15:47:23 CET 2008
Daniƫl Mantione wrote:
>
>
> Op Fri, 29 Feb 2008, schreef Christian Iversen:
>
>>> Memory access. What happens is that the non-packed version causes
>>> more cache misses. A cache miss costs many cycles on a modern cpu, a
>>> misaligned read just costs an extra memory access (which is fast if
>>> cached) on x86, and extra load instruction on ARM. This much cheaper
>>> than a chache miss.
>>
>> It's much worse than that. Some architectures simply _can't_ do
>> unaligned access, and they will trigger an exception.
>>
>> This exception will in many configurations be caught by the OS, that
>> then might simulate the read by doing 2 reads, putting the result
>> together, writing into the application memory, and doing a task switch.
>>
>> This, in total, is several _orders of magnitude_ worse than unaligned
>> access on a supported platform.
>>
>> Of course, unaligned access in itself is pretty bad.
>
> True, but irrelevant, because the discussion was under the assumption
> than an unaligned read is done using the "unaligned" pseudo function.
> Unless there is a bug in the compiler, the use of "unaligned" will never
> cause an exception.
Oh, you're right of course. I didn't catch that part of the argument.
> Instead "unaligned" will simulate an unaligned load with two loads and
> some rotation etc. On the ARM, where every mnemonic can rotate operands,
> this is isn't that bad of a penalty.
>
> Therefore, I wouldn't be surprised that even on ARM, arrays with packed
> structures are faster than arrays with unpacked structures.
That's possible. Why would it be faster, btw? Better cache coherency?
--
Med venlig hilsen
Christian Iversen
More information about the fpc-devel
mailing list