[fpc-devel] Optimisation and memory alignment question
Florian Klämpfl
florian at freepascal.org
Sun Feb 28 11:56:49 CET 2021
Am 28.02.21 um 11:11 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
>
> So to get to the point, I've spotted another potential peephole
> optimisation specifically on x86_64:
>
> movq (%rdx),%rax
> shrq $32,%rax
>
> Is it acceptable to change this to the following?
>
> movl 4(%rdx),%eax
Yes. If (%rdx) is naturally aligned (so to a 8 byte boundary), 4(%rdx)
is at least aligned to a 4 byte boundary and thus naturally aligned.
>
> Logically it's equivalent thanks to the guarantee that the upper 32-bits
> of the destination register will be zeroed, but I know sometimes there
> might be a penalty for reading from memory that isn't aligned to a
> 16-byte boundary, say.
x86 is very robust against misalignments and the example code is anyways
naturally aligned. Everything above natural alignment is coincidence.
>
> A "movl; shrl $16" version may be possible with movzx, but I'm not
> certain if that will be even more inefficient due to the offset now
> being 2 rather than 4.
>
> Gareth aka. Kit
>
>
More information about the fpc-devel
mailing list