[fpc-devel] Some handy information regarding LEA instructions
J. Gareth Moreton
gareth at moreton-family.com
Fri Sep 22 19:14:12 CEST 2023
Hi everyone,
I just discovered this while trying to optimise some of the hash
functions. This might already be known, but in case it isn't, here's
something useful to know.
The LEA instruction is useful because you can essentially perform "x :=
y + z + const" with one instruction, or just "x := y + z" or "x := y +
const" if none of the source and destination registers match. However,
on Sandy Bridge and later (not sure about AMD processors) the 3-operand
version has a 3-cycle latency and only one execution port (reduces
concurrency if there are nearby instructions that fetch addresses), but
the 2-operand version (whether reg/reg or reg/const) has only a single
cycle latency and can be dispatched to at least two different ports.
Long story short, if you have something like:
LEA ECX, [ECX + EAX + $f57c0faf]
ROL ECX, 7
There is a 2-cycle delay before the ROL instruction can be executed.
However, if you expand LEA into two ADD instructions:
ADD ECX, EAX
ADD ECX, $f57c0faf
ROL ECX, 7
Though slightly larger, this triplet executes one cycle faster overall
because there's no additional latency between the instructions.
The 3-operand LEA instruction is still useful in a few cases though:
- If all the registers are different though, since to expand it
into arithmetic/logical instructions, it would require an additional MOV
instruction, which doesn't offer any speed bonuses and just increases
code size.
- In cases where the destination is the same as one of the source
registers, as long as the destination isn't used for at least 3 cycles,
then it is a saving (minimising concurrent uses of the AGU execution
ports also helps).
- And of course, if one of the registers has a scalar muliplier,
then this is also faster than equivalent arithmetic/logical instructions.
With all this in mind I'll have a ponder about introducing a new
peephole optimisation that expands potentially slow LEA instructions.
Kit
More information about the fpc-devel
mailing list