[fpc-devel] New deep optimisation

Fri Oct 1 20:00:38 CEST 2021

Keep in mind that usually test/cmp and jcc instructions are macrofused 
but only if they are directly adjacent.

Am 01.10.2021 um 18:10 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
>
> I've started playing around with an optimisation on x86 platforms that 
> looks for common instructions that appear on both branches of a Jcc 
> instruction (i.e. after the label it jumps to and after the jump 
> itself), and so far I'm having a lot of success.  For example, in the 
> Math unit - before:
>
>     ...
> # Peephole Optimization: %rdx = %rdi; removed unnecessary instruction 
> (MovMov2MovNop 6b}
>     call    fpc_do_is
>     testb    %al,%al
>     je    .Lj196
>     movq    %rdi,%rdx
>     movq    %rsi,%rcx
>     call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
>     movb    %al,%bl
>     jmp    .Lj197
>     .p2align 4,,10
>     .p2align 3
> .Lj196:
>     movq    %rdi,%rdx
>     movq    %rsi,%rcx
>     call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
>     movb    %al,%bl
> .Lj197:
>     movb %bl,%al
>     ...
>
> After:
>
>     ...
> # Peephole Optimization: %rdx = %rdi; removed unnecessary instruction 
> (MovMov2MovNop 6b}
>     call    fpc_do_is
> # Peephole Optimization: Swapped test and mov instructions to improve 
> optimisation potential
>     movq    %rdi,%rdx
> # Peephole Optimization: Swapped test and mov instructions to improve 
> optimisation potential
>     movq    %rsi,%rcx
>     testb    %al,%al
> # Peephole Optimization: Moved mov instruction common to both branches 
> to before jump
> # Peephole Optimization: Moved mov instruction common to both branches 
> to before jump
> # Peephole Optimization: Moved destination label ahead of common 
> instructions
>     je    .Lj198
>     call    CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
>     movb    %al,%bl
>     jmp    .Lj197
>     .p2align 4,,10
>     .p2align 3
> .Lj198:
>     call    SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
>     movb    %al,%bl
> .Lj197:
>     movb    %bl,%al
>     ...
>
> In the above example, the parameter configuration prior to the two 
> CALL instructions are identical, so it can move these to before the 
> branching jump.
>
> However, some optimisations are not triggering because they expect a 
> jump or SETcc instruction to appear directly after a TEST instruction, 
> for example, and I can't just track the FLAGS register because it has 
> to check the condition that's being used too (e.g. "MovAndTest2Test" 
> requires the condition be C_E or C_NE).
>
> There are a couple of solutions to this:
>
> - Some instructions like those in the post-peephole stage could be 
> adapted to look ahead further for an appropriate instruction, stopping 
> if it finds one or if it finds another instruction that modifies the 
> flags.  This will produce more complicated compiler code though.
>
> - Have a flag that tells the compiler to run pass 1 again after pass 2 
> (and have my common instruction optimisations in pass 2). This would 
> allow deeper optimisations but may cause significant slowdown in the 
> compiler, so I would only recommend this flag be honoured under -O3 
> and -O4.
>
> I'm trying to weigh the pros and cons of each, not least because in 
> some cases, my common instruction optimisations aren't as efficient in 
> pass 2 because other pass 1 optimisations ensure the instructions 
> either side of the branch are no longer identical.
>
> Currently I'm seeing if I can avoid rerunning pass 1 and instead 
> improving the problematic optimisations to be more flexible with the 
> location of their SETcc and Jcc instructions.
>
> Gareth aka. Kit
>
>