[fpc-devel] Question about memory alignment (again!)

Stefan Glienke sglienke at dsharp.org
Thu Aug 18 09:48:43 CEST 2022


Interestingly this is what clang also does:

https://godbolt.org/z/Y4v14f9s3


> On 17/08/2022 02:21 CEST J. Gareth Moreton via fpc-devel <fpc-devel at lists.freepascal.org> wrote:
> 
>  
> Hi everyone,
> 
> Recently I've made some optimisations centred around the SHR instruction 
> on x86, and there was one pair of instructions that caught my attention:
> 
> movl (%rbx),%eax
> shrl $24,%eax
> 
> Is it permissible to optimise this to (x86 is little-endian):
> 
> movzbl 3(%rbx),%eax?
> 
> (You could also optimise "movl; sarl" into a "movsbl" instruction this way)
> 
> Logically the result is the same and it removes an instruction and a 
> pipeline stall, but will there be a performance hit that comes from 
> reading an unaligned byte of memory like that?
> 
> I did make similar optimisation once before with QWords using the 
> implicit zero-extension of the 32-bit MOV instruction - that is:
> 
> movq (%rbx),%rax
> shrq $32,%rax
> 
> To:
> 
> movl 4(%rbx),%eax
> 
> This one is a little nicer though because it's still on a 32-bit 
> boundary and so was permissible.
> 
> Gareth aka. Kit
> 
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


More information about the fpc-devel mailing list