[fpc-devel] Question about memory alignment (again!)
J. Gareth Moreton
gareth at moreton-family.com
Wed Aug 17 02:21:12 CEST 2022
Hi everyone,
Recently I've made some optimisations centred around the SHR instruction
on x86, and there was one pair of instructions that caught my attention:
movl (%rbx),%eax
shrl $24,%eax
Is it permissible to optimise this to (x86 is little-endian):
movzbl 3(%rbx),%eax?
(You could also optimise "movl; sarl" into a "movsbl" instruction this way)
Logically the result is the same and it removes an instruction and a
pipeline stall, but will there be a performance hit that comes from
reading an unaligned byte of memory like that?
I did make similar optimisation once before with QWords using the
implicit zero-extension of the 32-bit MOV instruction - that is:
movq (%rbx),%rax
shrq $32,%rax
To:
movl 4(%rbx),%eax
This one is a little nicer though because it's still on a 32-bit
boundary and so was permissible.
Gareth aka. Kit
More information about the fpc-devel
mailing list