[fpc-devel] @J. Gareth / Kit: Idea for peephole

Martin Frb lazarus at mfriebe.de
Wed Oct 1 11:59:04 CEST 2025


Hi Gareth,

there has been so many improvements to the peephole optimiser (big 
thanks for them), I was at first surprised it didn't pick up the below.
But then, of course its a load from memory, and memory could change.

The Pas code is not yet of interest (to complex / refactored version of 
components\lazutils\lazlistclasses.pas).
If this peaks your interest, I can try to extract something.
(It' getting 2 fields from a pointer to record)

The asm (3.3.1 about a month old -O3):

00000001000025E7 4C8B542420               mov r10,[rsp+$20]
00000001000025EC 458B5A0C                 mov r11d,[r10+$0C]
00000001000025F0 4C8B542420               mov r10,[rsp+$20]
00000001000025F5 410312                   add edx,[r10]

Note its loading [rsp+$20] into r10
And then 2 lines later loading the same value again. It still is in r10.

If you are interested in expanding the peephole for this, then the 
questions are

1) Can the peephole identify the equality of the source [rsp+$20] ?
Afaik, if it was loading from another register, and that hadn't changed, 
then its not loading again?? But I am not sure.
If so, then can it detect, on the 2nd load that "[rsp+$20]" is already 
there?

2) Can it detect, that there was no code that wrote to memory.
If of course any code had written to memory, then [rsp+$20] may have 
changed.
But if there is not, then it can not have changed.

2a) Well, it could change in another thread.
But if
- there is no lock'ed access (interlock....), nor read/write barrier
- there is no condition on it
- there is no jmp / conditional jump between the 2 statement.
Then it should be safe.

That is keeping the value is just saying the other thread has not yet 
changed it, which is always one correct scenario.
If any other thread changes it, then its subject to be a race condition 
anyway...

If there are conditions or jumps, then it could be a spinlock (with 
dirty read). So then it shouldn't be optimized.

------------
Bit of extra info, I can optimise it in my code. But  I loose some read 
ability.
I was checking how well the compiler handles various inlines, when I saw 
this.

The fields on the record of properties with getter functions that are 
inlined. Hence the code does not use local vars for the pointer. 
Otherwise the [rsp+$20] would likely be in a register, and probably be 
optimised.

Because the record has a self (pointer to the record is internally 
taken), the compiler does not optimize the value into a register. That 
is why it is taken from the stackframe each time.
But (with the above rules checked), even if other code had a copy to the 
pointer, such other code would not have run.

Again, yes, it could in a thread => but then the code itself is buggy, 
as the outcome is undefined (a race condition), and the optimization 
just decides for one of the 2 valid outcomes.

I don't know how often code like this happens, but with advanced records 
being used, it could be regular. (there are over a dozen in the 
lazlistclasses)
Match: (mov[a-z]*\s+\d+\(%rsp\),%[a-z0-9]+)[\r\n]+([ 
\t\#][^\r\n]*[\r\n]+){0,7}\s*\1
Of course the better fix would be if the register allocator would be way 
smarter. But that seems further away.



More information about the fpc-devel mailing list