[fpc-devel] New deep optimisation
Stefan Glienke
sglienke at dsharp.org
Fri Oct 1 20:00:38 CEST 2021
Keep in mind that usually test/cmp and jcc instructions are macrofused
but only if they are directly adjacent.
Am 01.10.2021 um 18:10 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
>
> I've started playing around with an optimisation on x86 platforms that
> looks for common instructions that appear on both branches of a Jcc
> instruction (i.e. after the label it jumps to and after the jump
> itself), and so far I'm having a lot of success. For example, in the
> Math unit - before:
>
> ...
> # Peephole Optimization: %rdx = %rdi; removed unnecessary instruction
> (MovMov2MovNop 6b}
> call fpc_do_is
> testb %al,%al
> je .Lj196
> movq %rdi,%rdx
> movq %rsi,%rcx
> call CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
> movb %al,%bl
> jmp .Lj197
> .p2align 4,,10
> .p2align 3
> .Lj196:
> movq %rdi,%rdx
> movq %rsi,%rcx
> call SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
> movb %al,%bl
> .Lj197:
> movb %bl,%al
> ...
>
> After:
>
> ...
> # Peephole Optimization: %rdx = %rdi; removed unnecessary instruction
> (MovMov2MovNop 6b}
> call fpc_do_is
> # Peephole Optimization: Swapped test and mov instructions to improve
> optimisation potential
> movq %rdi,%rdx
> # Peephole Optimization: Swapped test and mov instructions to improve
> optimisation potential
> movq %rsi,%rcx
> testb %al,%al
> # Peephole Optimization: Moved mov instruction common to both branches
> to before jump
> # Peephole Optimization: Moved mov instruction common to both branches
> to before jump
> # Peephole Optimization: Moved destination label ahead of common
> instructions
> je .Lj198
> call CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
> movb %al,%bl
> jmp .Lj197
> .p2align 4,,10
> .p2align 3
> .Lj198:
> call SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
> movb %al,%bl
> .Lj197:
> movb %bl,%al
> ...
>
> In the above example, the parameter configuration prior to the two
> CALL instructions are identical, so it can move these to before the
> branching jump.
>
> However, some optimisations are not triggering because they expect a
> jump or SETcc instruction to appear directly after a TEST instruction,
> for example, and I can't just track the FLAGS register because it has
> to check the condition that's being used too (e.g. "MovAndTest2Test"
> requires the condition be C_E or C_NE).
>
> There are a couple of solutions to this:
>
> - Some instructions like those in the post-peephole stage could be
> adapted to look ahead further for an appropriate instruction, stopping
> if it finds one or if it finds another instruction that modifies the
> flags. This will produce more complicated compiler code though.
>
> - Have a flag that tells the compiler to run pass 1 again after pass 2
> (and have my common instruction optimisations in pass 2). This would
> allow deeper optimisations but may cause significant slowdown in the
> compiler, so I would only recommend this flag be honoured under -O3
> and -O4.
>
> I'm trying to weigh the pros and cons of each, not least because in
> some cases, my common instruction optimisations aren't as efficient in
> pass 2 because other pass 1 optimisations ensure the instructions
> either side of the branch are no longer identical.
>
> Currently I'm seeing if I can avoid rerunning pass 1 and instead
> improving the problematic optimisations to be more flexible with the
> location of their SETcc and Jcc instructions.
>
> Gareth aka. Kit
>
>
More information about the fpc-devel
mailing list