[fpc-devel] New deep optimisation

Fri Oct 1 18:10:57 CEST 2021

Hi everyone,

I've started playing around with an optimisation on x86 platforms that 
looks for common instructions that appear on both branches of a Jcc 
instruction (i.e. after the label it jumps to and after the jump 
itself), and so far I'm having a lot of success.  For example, in the 
Math unit - before:

     ...
# Peephole Optimization: %rdx = %rdi; removed unnecessary instruction 
(MovMov2MovNop 6b}
     call    fpc_do_is
     testb    %al,%al
     je    .Lj196
     movq    %rdi,%rdx
     movq    %rsi,%rcx
     call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
     movb    %al,%bl
     jmp    .Lj197
     .p2align 4,,10
     .p2align 3
.Lj196:
     movq    %rdi,%rdx
     movq    %rsi,%rcx
     call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
     movb    %al,%bl
.Lj197:
     movb %bl,%al
     ...

After:

     ...
# Peephole Optimization: %rdx = %rdi; removed unnecessary instruction 
(MovMov2MovNop 6b}
     call    fpc_do_is
# Peephole Optimization: Swapped test and mov instructions to improve 
optimisation potential
     movq    %rdi,%rdx
# Peephole Optimization: Swapped test and mov instructions to improve 
optimisation potential
     movq    %rsi,%rcx
     testb    %al,%al
# Peephole Optimization: Moved mov instruction common to both branches 
to before jump
# Peephole Optimization: Moved mov instruction common to both branches 
to before jump
# Peephole Optimization: Moved destination label ahead of common 
instructions
     je    .Lj198
     call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
     movb    %al,%bl
     jmp    .Lj197
     .p2align 4,,10
     .p2align 3
.Lj198:
     call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
     movb    %al,%bl
.Lj197:
     movb    %bl,%al
     ...

In the above example, the parameter configuration prior to the two CALL 
instructions are identical, so it can move these to before the branching 
jump.

However, some optimisations are not triggering because they expect a 
jump or SETcc instruction to appear directly after a TEST instruction, 
for example, and I can't just track the FLAGS register because it has to 
check the condition that's being used too (e.g. "MovAndTest2Test" 
requires the condition be C_E or C_NE).

There are a couple of solutions to this:

- Some instructions like those in the post-peephole stage could be 
adapted to look ahead further for an appropriate instruction, stopping 
if it finds one or if it finds another instruction that modifies the 
flags.  This will produce more complicated compiler code though.

- Have a flag that tells the compiler to run pass 1 again after pass 2 
(and have my common instruction optimisations in pass 2). This would 
allow deeper optimisations but may cause significant slowdown in the 
compiler, so I would only recommend this flag be honoured under -O3 and -O4.

I'm trying to weigh the pros and cons of each, not least because in some 
cases, my common instruction optimisations aren't as efficient in pass 2 
because other pass 1 optimisations ensure the instructions either side 
of the branch are no longer identical.

Currently I'm seeing if I can avoid rerunning pass 1 and instead 
improving the problematic optimisations to be more flexible with the 
location of their SETcc and Jcc instructions.

Gareth aka. Kit

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus