[fpc-devel] @J. Gareth / Kit: Idea for peephole
Martin Frb
lazarus at mfriebe.de
Wed Oct 1 11:59:04 CEST 2025
Hi Gareth,
there has been so many improvements to the peephole optimiser (big
thanks for them), I was at first surprised it didn't pick up the below.
But then, of course its a load from memory, and memory could change.
The Pas code is not yet of interest (to complex / refactored version of
components\lazutils\lazlistclasses.pas).
If this peaks your interest, I can try to extract something.
(It' getting 2 fields from a pointer to record)
The asm (3.3.1 about a month old -O3):
00000001000025E7 4C8B542420 mov r10,[rsp+$20]
00000001000025EC 458B5A0C mov r11d,[r10+$0C]
00000001000025F0 4C8B542420 mov r10,[rsp+$20]
00000001000025F5 410312 add edx,[r10]
Note its loading [rsp+$20] into r10
And then 2 lines later loading the same value again. It still is in r10.
If you are interested in expanding the peephole for this, then the
questions are
1) Can the peephole identify the equality of the source [rsp+$20] ?
Afaik, if it was loading from another register, and that hadn't changed,
then its not loading again?? But I am not sure.
If so, then can it detect, on the 2nd load that "[rsp+$20]" is already
there?
2) Can it detect, that there was no code that wrote to memory.
If of course any code had written to memory, then [rsp+$20] may have
changed.
But if there is not, then it can not have changed.
2a) Well, it could change in another thread.
But if
- there is no lock'ed access (interlock....), nor read/write barrier
- there is no condition on it
- there is no jmp / conditional jump between the 2 statement.
Then it should be safe.
That is keeping the value is just saying the other thread has not yet
changed it, which is always one correct scenario.
If any other thread changes it, then its subject to be a race condition
anyway...
If there are conditions or jumps, then it could be a spinlock (with
dirty read). So then it shouldn't be optimized.
------------
Bit of extra info, I can optimise it in my code. But I loose some read
ability.
I was checking how well the compiler handles various inlines, when I saw
this.
The fields on the record of properties with getter functions that are
inlined. Hence the code does not use local vars for the pointer.
Otherwise the [rsp+$20] would likely be in a register, and probably be
optimised.
Because the record has a self (pointer to the record is internally
taken), the compiler does not optimize the value into a register. That
is why it is taken from the stackframe each time.
But (with the above rules checked), even if other code had a copy to the
pointer, such other code would not have run.
Again, yes, it could in a thread => but then the code itself is buggy,
as the outcome is undefined (a race condition), and the optimization
just decides for one of the 2 valid outcomes.
I don't know how often code like this happens, but with advanced records
being used, it could be regular. (there are over a dozen in the
lazlistclasses)
Match: (mov[a-z]*\s+\d+\(%rsp\),%[a-z0-9]+)[\r\n]+([
\t\#][^\r\n]*[\r\n]+){0,7}\s*\1
Of course the better fix would be if the register allocator would be way
smarter. But that seems further away.
More information about the fpc-devel
mailing list