[fpc-devel] Patch, font rendering on Arm-Linux devices.
Vinzent Hoefler
JeLlyFish.software at gmx.net
Thu Feb 28 09:40:27 CET 2008
On Tuesday 26 February 2008 17:27, Luiz Americo Pereira Camara wrote:
> Yury Sidorov wrote:
> > The patch removes packed record for some platforms.
> > IMO packed can be removed for all platforms. It will gain some
> > speed.
>
> I'd like to understand more this issue.
> Why are non packed records faster?
> The difference occurs at memory allocation or at memory access?
At memory access.
On x86 processors it's usually only a speed penalty (or has anyone ever
seen the AC flag turned on?), on other processors you may even have to
workaround exceptions (i.e. bus errors), because the processor simply
refuses to read or write unaligned data. And then the only way to
circumvent the processor's refusal is to read/write the data byte by
byte or mask it out, which is slower than just reading or writing it.
Consider writing a 16-bit value spanning across 32-bit-values where the
processor can only access a single 32 bits value at an aligned address:
*_ _ _ _*_ _ _ _
|0|1|2|3|4|5|6|7|
|_______|
Now the data you need is spanning across bytes [2:5], but the processor
can only read full 32 bits either at position 0 (reading bytes [0:3]),
or position 4 (reading byte [4:7]). You'd need to read both processor
words, mask the data in the lower and upper half of each and write back
both words with the new data patched "inbetween" them.
So by now, no matter if the processor handles it for you or if the
compiler would insert the necessary code to do it, even a simple
increment is insanely expensive in terms of processor cycles.
Vinzent.
More information about the fpc-devel
mailing list