[fpc-devel] -O3 peephole proposal... run Pass 1 again if Pass 2 returns True

J. Gareth Moreton gareth at moreton-family.com
Sun Feb 28 05:15:58 CET 2021


Just as an example, when compiling the System unit on r48813, there 
exists this block of disassembly:

.Lj4072:
     ...
     leaq    (%rsi,%r13),%rax
     leaq    -1(%rax),%r12
# Peephole Optimization: SubMov2LeaSub
     subq    $1,%rax
     ...

With my improvement over at i38555, the optimiser can remove the sub 
instruction because %rax doesn't get used afterwards, hence:

.Lj4072:
     ...
     jne    .Lj4070
     leaq    (%rsi,%r13),%rax
# Peephole Optimization: SubMov2Lea
     leaq    -1(%rax),%r12
     ...

SubMov2Lea (and SubMov2LeaSub) is a Pass 2 optimisation because of the 
potential to do deeper optimisations on the MOV instruction (which are 
in Pass 1).  After the optimisation is made, and with the knowledge that 
%rax's value is discarded afterwards, careful observation will reveal 
that the two LEA instructions can be merged:

.Lj4072:
     ...
     jne    .Lj4070
# Peephole Optimization: SubMov2Lea
     leaq    -1(%rsi,%r13),%r12
     ...

I've been working in a separate branch to improve the optimisations in 
OptPass1LEA to detect this (it currently doesn't because the two 
destination registers aren't identical), and this is why I call 
OptPass1LEA from OptPass2SUB in the patch provided on i38555, although 
as I originally described, this feels somewhat hacky and has a risk of 
opening up more bugs.  A safer and more thorough approach, although 
slower, would be to call Pass 1 again where the register tracking is up 
to date, for example (when calling from OptPass2SUB, because the first 
LEA is the previous instruction, the register tracking is ahead by one 
instruction upon entering OptPass1LEA).

Gareth aka. Kit


On 28/02/2021 01:51, J. Gareth Moreton via fpc-devel wrote:
> Hi everyone,
>
> I'm currently developing some new optimisations for Lea instructions 
> after I discovered some new potential ones after fixing i38527.  That 
> aside though, sometimes these optimisations only become apparent after 
> Pass 2 has completed.  I've tried to change the order of things so the 
> optimisation is made in Pass 1, but there's no easy combination that 
> ensures the best optimisations take place (i.e. I make a change to 
> improve one optimisation, and another one is made worse at the same 
> time).
>
> I've taken to calling OptPass1XXX routines from OptPass2XXX routines 
> in places where this is likely to happen, and so far this produces the 
> best code - however, it feels hacky and problems may occur with 
> register tracking if OptPass1XXX is called on a different instruction 
> to the current one (e.g. one optimisation I've found requires calling 
> GetLastInstruction and then calling OptPass1LEA on the result if it's 
> a LEA instruction).
>
> So to help clean up the code and provide the best output, I would like 
> to propose a cross-platform change to the peephole optimizer:
>
> - Under -O3, if a change was made in Pass 2 (implied if any of the 
> OptPass2XXX routines return True), the peephole optimiser cycles back 
> to Pass 1 and tries again.
>
> There are a few variants for this:
>
> - After Pass 1 is called after Pass 2, it then goes to the 
> Post-peephole Pass regardless of if anything was changed.
>
> - It goes through the whole process again in that after Pass 1 is 
> called again, Pass 2 is then called again, and if Pass 2 returns True 
> again, then it goes back to Pass 1 and does it as many times as needed 
> (or until it hits an upper limit to prevent an infinite loop due to a 
> compiler bug).  Only once does Pass 2 return False that it goes to the 
> Post-peephole Pass.
>
> - The third variant is that variant 1 is done for -O2 and variant 2 is 
> done for -O3 (and no extra run of Pass 1 for -O1).
>
> The obvious side-effect is that it causes the compiler to run slightly 
> slower, but this could potentially be mitigated by merging the 
> Pre-Peephole Pass with Pass 1, thus eliminating a distinct pass, while 
> any missed optimisations that occur due to this are picked up in the 
> second call to Pass 1 (it will most likely be picked up in the first 
> call to Pass 1 due to PeepHoleOptPass1Cpu returning True and 
> signalling another iteration).
>
> What are everyone's thoughts?
>
> Gareth aka. Kit
>
>

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list