[fpc-devel] -O3 peephole proposal... run Pass 1 again if Pass 2 returns True

J. Gareth Moreton gareth at moreton-family.com
Sun Feb 28 02:51:25 CET 2021


Hi everyone,

I'm currently developing some new optimisations for Lea instructions 
after I discovered some new potential ones after fixing i38527.  That 
aside though, sometimes these optimisations only become apparent after 
Pass 2 has completed.  I've tried to change the order of things so the 
optimisation is made in Pass 1, but there's no easy combination that 
ensures the best optimisations take place (i.e. I make a change to 
improve one optimisation, and another one is made worse at the same time).

I've taken to calling OptPass1XXX routines from OptPass2XXX routines in 
places where this is likely to happen, and so far this produces the best 
code - however, it feels hacky and problems may occur with register 
tracking if OptPass1XXX is called on a different instruction to the 
current one (e.g. one optimisation I've found requires calling 
GetLastInstruction and then calling OptPass1LEA on the result if it's a 
LEA instruction).

So to help clean up the code and provide the best output, I would like 
to propose a cross-platform change to the peephole optimizer:

- Under -O3, if a change was made in Pass 2 (implied if any of the 
OptPass2XXX routines return True), the peephole optimiser cycles back to 
Pass 1 and tries again.

There are a few variants for this:

- After Pass 1 is called after Pass 2, it then goes to the Post-peephole 
Pass regardless of if anything was changed.

- It goes through the whole process again in that after Pass 1 is called 
again, Pass 2 is then called again, and if Pass 2 returns True again, 
then it goes back to Pass 1 and does it as many times as needed (or 
until it hits an upper limit to prevent an infinite loop due to a 
compiler bug).  Only once does Pass 2 return False that it goes to the 
Post-peephole Pass.

- The third variant is that variant 1 is done for -O2 and variant 2 is 
done for -O3 (and no extra run of Pass 1 for -O1).

The obvious side-effect is that it causes the compiler to run slightly 
slower, but this could potentially be mitigated by merging the 
Pre-Peephole Pass with Pass 1, thus eliminating a distinct pass, while 
any missed optimisations that occur due to this are picked up in the 
second call to Pass 1 (it will most likely be picked up in the first 
call to Pass 1 due to PeepHoleOptPass1Cpu returning True and signalling 
another iteration).

What are everyone's thoughts?

Gareth aka. Kit


-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list