[fpc-devel] Patch, font rendering on Arm-Linux devices.
Vinzent Hoefler
JeLlyFish.software at gmx.net
Thu Feb 28 11:37:39 CET 2008
On Thursday 28 February 2008 11:25, Daniël Mantione wrote:
> Op Thu, 28 Feb 2008, schreef Vinzent Hoefler:
> > On Thursday 28 February 2008 09:16, Daniël Mantione wrote:
> >> Memory access. What happens is that the non-packed version causes
> >> more cache misses.
> >
OMG. I'm soooo confused. ;) I read "that the packed version causes more
cache misses" here. That was the part where I didn't understand why.
> > Please elaborate. If the (unaligned) data is crossing a cache-line,
> > thus causing two full cache-line reads, I'd understand that, but
> > once it's in the cache, it wouldn't matter anymore?
>
> Yes, but if you have an array of them (as we have in this case),
> considerably more of these records will fit in the cache.
Yes, that's what I figured, so I'm on the same path as you here, it
seems, but tracing back the discussion it read:
-- 8< --
> I'd like to understand more this issue.
> Why are non packed records faster?
Cache trashing. One of the most underestimated performance killers in
modern software.
> The difference occurs at memory allocation or at memory access?
Memory access. What happens is that the non-packed version causes more
cache misses.
-- 8< --
The first part tells me non-packed records are faster, but the second
line tells me that the non-packed version also causes more cache
misses, thus is slower. That got me confused, I think.
Of course, the net result only depends on the benchmark you're using. ;)
Vinzent.
More information about the fpc-devel
mailing list