[fpc-devel] Optimisation and memory alignment question

Sun Feb 28 11:11:09 CET 2021

Hi everyone,

So to get to the point, I've spotted another potential peephole 
optimisation specifically on x86_64:

     movq    (%rdx),%rax
     shrq    $32,%rax

Is it acceptable to change this to the following?

     movl    4(%rdx),%eax

Logically it's equivalent thanks to the guarantee that the upper 32-bits 
of the destination register will be zeroed, but I know sometimes there 
might be a penalty for reading from memory that isn't aligned to a 
16-byte boundary, say.

A "movl; shrl $16" version may be possible with movzx, but I'm not 
certain if that will be even more inefficient due to the offset now 
being 2 rather than 4.

Gareth aka. Kit

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus