[fpc-devel] Peephole Pass 1 Optimisation Suggestion
J. Gareth Moreton
gareth at moreton-family.com
Tue Feb 4 12:10:52 CET 2020
I have an idea in regards to improving compilation speed. It mostly applies to the x86 family, but I see no
reason why it cannot be platform-agnostic. The idea is basically this:
- The optimisation level selected (-O1, -O2, -O3/-O4) dictates the MAXIMUM number of times Pass 1 is executed
for a block of code. Maximum count will be 1 for -O1, 2 for -O2 and 5 for -O3 and -O4.
- Pass 1 optimisation is stopped if the maximum pass count is reached or if no changes were made (no functions
returned True for that iteration).
Currently, at least for x86, at least two runs of Pass 1 are performed, even if the first iteration did not
change anything. Under -O3 and -O4, pass 1 is run as many times as it needs to until all individual
optimisation methods return False, but then a final iteration of pass 1 is run anyway. The main reason for
this is because some changes may forget to set the Result to True (assembler comparisons under -O2 will detect
some of these).
In terms of benefits, -O1, being the quick, debugger-friendly option, will compile faster because an entire
iteration of Pass 1 is dropped at the cost of slightly less efficient code (but such code shouldn't be used
for a release build and only for the debugging of high-level code, so is acceptable in my eyes), -O2 will be
approximately equal speed except for the simplest of routines (which will be slightly faster), and -O3 and -O4
will be faster because these will drop at least one run-through of Pass 1. There is a chance that the most
complex of routines will be less optimal, but after 5 iterations, the vast majority of code blocks should be
optimal - if not, then I'd argue that some of the optimisation routines could be improved to do more in a
Also, from a safety perspective, if there is a faulty optimisation that causes an infinite loop (e.g. two
optimisations that 'fight' each other, of which at least one partial example exists in x86), the maximum pass
count ensures the compiler can still progress even under the highest optimisation settings. Originally, -O3
used to run Pass 1 a maximum for 4 times (not including the 2nd call to Pass 1 afterwards, hence why I
selected 5 as the maximum count), but this was removed at some point in the past, admittedly by myself under
the mistaken belief that optimisations wouldn't produce buggy code or otherwise get caught in an infinite
For testing and comparison, since this only involves the number of runs of Pass 1 and not what Pass 1 actually
does, side-by-side analysis of assembler dumps using a directory comparison tool will confirm that output code
is unchanged for -O2 and higher, and measuring compilation time will determine that there is indeed a saving.
That's my plan... how does it sound?
Gareth aka. Kit
More information about the fpc-devel