[fpc-devel] Patch, font rendering on Arm-Linux devices.

Thu Feb 28 09:40:27 CET 2008

On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote:
> Yury Sidorov wrote:
> > The patch removes packed record for some platforms.
> > IMO packed can be removed for all platforms. It will gain some
> > speed.
>
> I'd like to understand more this issue.
> Why are non packed records faster?
> The difference occurs at memory allocation or at memory access?

At memory access.

On x86 processors it's usually only a speed penalty (or has anyone ever 
seen the AC flag turned on?), on other processors you may even have to 
workaround exceptions (i.e. bus errors), because the processor simply 
refuses to read or write unaligned data. And then the only way to 
circumvent the processor's refusal is to read/write the data byte by 
byte or mask it out, which is slower than just reading or writing it.

Consider writing a 16-bit value spanning across 32-bit-values where the 
processor can only access a single 32 bits value at an aligned address:

*_ _ _ _*_ _ _ _
|0|1|2|3|4|5|6|7|
    |_______|

Now the data you need is spanning across bytes [2:5], but the processor 
can only read full 32 bits either at position 0 (reading bytes [0:3]), 
or position 4 (reading byte [4:7]). You'd need to read both processor 
words, mask the data in the lower and upper half of each and write back 
both words with the new data patched "inbetween" them.

So by now, no matter if the processor handles it for you or if the 
compiler would insert the necessary code to do it, even a simple 
increment is insanely expensive in terms of processor cycles.

Vinzent.