[fpc-devel] Patch, font rendering on Arm-Linux devices.

Thu Feb 28 10:26:39 CET 2008

Michael Schnell wrote:
> If it accesses a misaligned 32 bit value it does two accesses (not 4): 
> e.g. once 8 bit and once 24 bit (when reading each of the accesses is 
> the same 32 bit, anyway).

Logically you should think about it how I explained. That Intel did an 
optimization to make the speed impact less is a different issue: 
internally the processor still has to have separate "8 bit" data paths 
and do shifting to reorder the bytes.

Perhaps this behaviour is specified in their optimization documents, or 
maybe you have the VHDL source? :-)

> Transferring data from/to the 1st level cache imposes a lot more delay 
> than the misaligned access. Thus if there are many instances of a record 
> variable that are used for calculation, it might be much faster to use 
> the packed version. If there are only a few, usually the unpacked 
> version should be faster.

Show me the benchmark results ;-)

Micha