[fpc-devel] Patch, font rendering on Arm-Linux devices.

Christian Iversen chrivers at iversen-net.dk
Fri Feb 29 15:47:23 CET 2008


Daniƫl Mantione wrote:
> 
> 
> Op Fri, 29 Feb 2008, schreef Christian Iversen:
> 
>>> Memory access. What happens is that the non-packed version causes 
>>> more cache misses. A cache miss costs many cycles on a modern cpu, a 
>>> misaligned read just costs an extra memory access (which is fast if 
>>> cached) on x86, and extra load instruction on ARM. This much cheaper 
>>> than a chache miss.
>>
>> It's much worse than that. Some architectures simply _can't_ do 
>> unaligned access, and they will trigger an exception.
>>
>> This exception will in many configurations be caught by the OS, that 
>> then might simulate the read by doing 2 reads, putting the result 
>> together, writing into the application memory, and doing a task switch.
>>
>> This, in total, is several _orders of magnitude_ worse than unaligned 
>> access on a supported platform.
>>
>> Of course, unaligned access in itself is pretty bad.
> 
> True, but irrelevant, because the discussion was under the assumption 
> than an unaligned read is done using the "unaligned" pseudo function. 
> Unless there is a bug in the compiler, the use of "unaligned" will never 
> cause an exception.

Oh, you're right of course. I didn't catch that part of the argument.

> Instead "unaligned" will simulate an unaligned load with two loads and 
> some rotation etc. On the ARM, where every mnemonic can rotate operands, 
> this is isn't that bad of a penalty.
> 
> Therefore, I wouldn't be surprised that even on ARM, arrays with packed 
> structures are faster than arrays with unpacked structures.

That's possible. Why would it be faster, btw? Better cache coherency?

-- 
Med venlig hilsen
Christian Iversen



More information about the fpc-devel mailing list