[fpc-devel] Patch, font rendering on Arm-Linux devices.
mschnell at lumino.de
Thu Feb 28 10:21:31 CET 2008
Micha Nelissen wrote:
> In addition to what the others said, think of it like your 32 bit
> processor suddenly being a 8 bit processor: it has to manually load 4
> times 8 bit, arrange them into a 32 bit value, and only then use it.
> With non packed, it can use the value directly.
With an x86 no additional code needs to be created by the compiler, as
it _can_ do misaligned accesses (there are other processors that can't
and need more code).
If it accesses a misaligned 32 bit value it does two accesses (not 4):
e.g. once 8 bit and once 24 bit (when reading each of the accesses is
the same 32 bit, anyway).
But all this is only internal in the core of the chip and thus _very_
fast, as the chip contains a (1st level) cache and same is connected to
the second level cache (also within the chip) with a 128 bit or more
Transferring data from/to the 1st level cache imposes a lot more delay
than the misaligned access. Thus if there are many instances of a record
variable that are used for calculation, it might be much faster to use
the packed version. If there are only a few, usually the unpacked
version should be faster.
More information about the fpc-devel