[fpc-devel] Patch, font rendering on Arm-Linux devices.

Thu Feb 28 11:37:39 CET 2008

On Thursday 28 February 2008 11:25, Daniël Mantione wrote:
> Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
> > On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
> >> Memory access. What happens is that the non-packed version causes
> >> more cache misses.
> >

OMG. I'm soooo confused. ;) I read "that the packed version causes more 
cache misses" here. That was the part where I didn't understand why.

> > Please elaborate. If the (unaligned) data is crossing a cache-line,
> > thus causing two full cache-line reads, I'd understand that, but
> > once it's in the cache, it wouldn't matter anymore?
>
> Yes, but if you have an array of them (as we have in this case),
> considerably more of these records will fit in the cache.

Yes, that's what I figured, so I'm on the same path as you here, it 
seems, but tracing back the discussion it read:

-- 8< --
> I'd like to understand more this issue.
> Why are non packed records faster?

Cache trashing. One of the most underestimated performance killers in 
modern software.

> The difference occurs at memory allocation or at memory access?

Memory access. What happens is that the non-packed version causes more 
cache misses.

-- 8< --

The first part tells me non-packed records are faster, but the second 
line tells me that the non-packed version also causes more cache 
misses, thus is slower. That got me confused, I think.

Of course, the net result only depends on the benchmark you're using. ;)

Vinzent.