[fpc-devel] Patch, font rendering on Arm-Linux devices.

Fri Feb 29 15:43:58 CET 2008

Op Fri, 29 Feb 2008, schreef Christian Iversen:

>> Memory access. What happens is that the non-packed version causes more 
>> cache misses. A cache miss costs many cycles on a modern cpu, a misaligned 
>> read just costs an extra memory access (which is fast if cached) on x86, 
>> and extra load instruction on ARM. This much cheaper than a chache miss.
>
> It's much worse than that. Some architectures simply _can't_ do unaligned 
> access, and they will trigger an exception.
>
> This exception will in many configurations be caught by the OS, that then 
> might simulate the read by doing 2 reads, putting the result together, 
> writing into the application memory, and doing a task switch.
>
> This, in total, is several _orders of magnitude_ worse than unaligned access 
> on a supported platform.
>
> Of course, unaligned access in itself is pretty bad.

True, but irrelevant, because the discussion was under the assumption than 
an unaligned read is done using the "unaligned" pseudo function. Unless 
there is a bug in the compiler, the use of "unaligned" will never cause an 
exception.

Instead "unaligned" will simulate an unaligned load with two loads and 
some rotation etc. On the ARM, where every mnemonic can rotate operands, 
this is isn't that bad of a penalty.

Therefore, I wouldn't be surprised that even on ARM, arrays with packed 
structures are faster than arrays with unpacked structures.

Daniël