[fpc-devel] Optimisation and memory alignment question

Sun Feb 28 11:56:49 CET 2021

Am 28.02.21 um 11:11 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
> 
> So to get to the point, I've spotted another potential peephole 
> optimisation specifically on x86_64:
> 
>      movq    (%rdx),%rax
>      shrq    $32,%rax
> 
> Is it acceptable to change this to the following?
> 
>      movl    4(%rdx),%eax

Yes. If (%rdx) is naturally aligned (so to a 8 byte boundary), 4(%rdx) 
is at least aligned to a 4 byte boundary and thus naturally aligned.

> 
> Logically it's equivalent thanks to the guarantee that the upper 32-bits 
> of the destination register will be zeroed, but I know sometimes there 
> might be a penalty for reading from memory that isn't aligned to a 
> 16-byte boundary, say.

x86 is very robust against misalignments and the example code is anyways 
naturally aligned. Everything above natural alignment is coincidence.

> 
> A "movl; shrl $16" version may be possible with movzx, but I'm not 
> certain if that will be even more inefficient due to the offset now 
> being 2 rather than 4.
> 
> Gareth aka. Kit
> 
>