[fpc-devel] Some handy information regarding LEA instructions

J. Gareth Moreton gareth at moreton-family.com
Fri Sep 22 19:14:12 CEST 2023


Hi everyone,

I just discovered this while trying to optimise some of the hash 
functions.  This might already be known, but in case it isn't, here's 
something useful to know.

The LEA instruction is useful because you can essentially perform "x := 
y + z + const" with one instruction, or just "x := y + z" or "x := y + 
const" if none of the source and destination registers match.  However, 
on Sandy Bridge and later (not sure about AMD processors) the 3-operand 
version has a 3-cycle latency and only one execution port (reduces 
concurrency if there are nearby instructions that fetch addresses), but 
the 2-operand version (whether reg/reg or reg/const) has only a single 
cycle latency and can be dispatched to at least two different ports.

Long story short, if you have something like:

LEA ECX, [ECX + EAX + $f57c0faf]
ROL ECX, 7

There is a 2-cycle delay before the ROL instruction can be executed.  
However, if you expand LEA into two ADD instructions:

ADD ECX, EAX
ADD ECX, $f57c0faf
ROL ECX, 7

Though slightly larger, this triplet executes one cycle faster overall 
because there's no additional latency between the instructions.

The 3-operand LEA instruction is still useful in a few cases though:

     - If all the registers are different though, since to expand it 
into arithmetic/logical instructions, it would require an additional MOV 
instruction, which doesn't offer any speed bonuses and just increases 
code size.

     - In cases where the destination is the same as one of the source 
registers, as long as the destination isn't used for at least 3 cycles, 
then it is a saving (minimising concurrent uses of the AGU execution 
ports also helps).

     - And of course, if one of the registers has a scalar muliplier, 
then this is also faster than equivalent arithmetic/logical instructions.

With all this in mind I'll have a ponder about introducing a new 
peephole optimisation that expands potentially slow LEA instructions.

Kit



More information about the fpc-devel mailing list