[fpc-devel] Patch, font rendering on Arm-Linux devices.

Fri Feb 29 01:55:32 CET 2008

Daniël Mantione wrote:
>
>
> Op Tue, 26 Feb 2008, schreef Luiz Americo Pereira Camara:
>
>> Yury Sidorov wrote:
>>> The patch removes packed record for some platforms.
>>> IMO packed can be removed for all platforms. It will gain some speed.
>>
>> I'd like to understand more this issue.
>> Why are non packed records faster?
>
> Cache trashing. One of the most underestimated performance killers in 
> modern software.
>
>> The difference occurs at memory allocation or at memory access?
>
> Memory access. What happens is that the non-packed version causes more 
> cache misses. A cache miss costs many cycles on a modern cpu, a 
> misaligned read just costs an extra memory access (which is fast if 
> cached) on x86, and extra load instruction on ARM. This much cheaper 
> than a chache miss.

Thanks for all explanation. I'm sure that the change is worth.

One more question:

The VirtualTreeView tries to make the fields of the (packed) record 
aligned at dword boundary by grouping together smaller (one or two byte 
fields) or adding dummy fields. Does this trick overrides the unaligned 
memory access?

The real beast:

TVirtualNodePacked = packed record
    Index,    //Offset 0            
    ChildCount: Cardinal; //Offset 4
    NodeHeight: Word;  //Offset 8
    States: TVirtualNodeStates;  //Offset 10 *
    Align: Byte;  //Offset 14 **        
    CheckState: TCheckState; //Offset 15 **
    CheckType: TCheckType; //Offset 16
    Dummy: Byte;  //Offset 17         
    TotalCount: Cardinal; //Offset 18 *
   [...]

For what i understand, the fields marked with * makes an unaligned 
access because they are not in dword boundary. Right?
Fields with ** also are not dword boundary aligned, but since are one 
byte fields there's not unaligned access. Right?

And about 64bit systems. Should the fields be qword aligned or dword is 
still sufficient?

Luiz