[fpc-devel] New deep optimisation
J. Gareth Moreton
gareth at moreton-family.com
Fri Oct 1 20:27:29 CEST 2021
Currently, there's an optimisation that tries to relocate MOV
instructions so they appear before CMP and TEST instructions (you can
see it occurring in the code sample). This is usually generated by the
"J(c)Mov0JmpMov1 -> Set(c)" optimisation if the destination is not an
8-bit register, in which case, moving the MOV instruction so it appears
before CMP (so long as it doesn't share any registers with it) aids
optimisation if it's something like "movl $0,%eax", which can't be
encoded as "xorl %eax,%eax" if the FLAGS register is in use.
Macrofusion is an interesting point. I'll have to look into that one.
The only instructions that are moved in this "common instruction"
optimisation are ones that don't touch the FLAGS register, but only MOV
instructions are currently moved to appear before CMP and TEST
instructions if possible. In truth, any instruction that doesn't modify
the flags and doesn't share a register with the CMP/TEST instruction can
be moved, and can usually be executed in parallel with the comparison
(using another ALU, for example).
It might be that I have to add an extra Pass 2 optimisation that detects
"CMP/MOV/Jcc" triplets that remain and "unoptimise" the MOV/Jcc pair in
order to aid macrofusion.
Thanks for the insight Stefan.
Gareth aka. Kit
On 01/10/2021 19:00, Stefan Glienke via fpc-devel wrote:
> Keep in mind that usually test/cmp and jcc instructions are macrofused
> but only if they are directly adjacent.
>
> Am 01.10.2021 um 18:10 schrieb J. Gareth Moreton via fpc-devel:
>> Hi everyone,
>>
>> I've started playing around with an optimisation on x86 platforms
>> that looks for common instructions that appear on both branches of a
>> Jcc instruction (i.e. after the label it jumps to and after the jump
>> itself), and so far I'm having a lot of success. For example, in the
>> Math unit - before:
>>
>> ...
>> # Peephole Optimization: %rdx = %rdi; removed unnecessary instruction
>> (MovMov2MovNop 6b}
>> call fpc_do_is
>> testb %al,%al
>> je .Lj196
>> movq %rdi,%rdx
>> movq %rsi,%rcx
>> call CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
>> movb %al,%bl
>> jmp .Lj197
>> .p2align 4,,10
>> .p2align 3
>> .Lj196:
>> movq %rdi,%rdx
>> movq %rsi,%rcx
>> call SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
>> movb %al,%bl
>> .Lj197:
>> movb %bl,%al
>> ...
>>
>> After:
>>
>> ...
>> # Peephole Optimization: %rdx = %rdi; removed unnecessary instruction
>> (MovMov2MovNop 6b}
>> call fpc_do_is
>> # Peephole Optimization: Swapped test and mov instructions to improve
>> optimisation potential
>> movq %rdi,%rdx
>> # Peephole Optimization: Swapped test and mov instructions to improve
>> optimisation potential
>> movq %rsi,%rcx
>> testb %al,%al
>> # Peephole Optimization: Moved mov instruction common to both
>> branches to before jump
>> # Peephole Optimization: Moved mov instruction common to both
>> branches to before jump
>> # Peephole Optimization: Moved destination label ahead of common
>> instructions
>> je .Lj198
>> call CLASSES$_$TBITS_$__$$_EQUALS$TBITS$$BOOLEAN
>> movb %al,%bl
>> jmp .Lj197
>> .p2align 4,,10
>> .p2align 3
>> .Lj198:
>> call SYSTEM$_$TOBJECT_$__$$_EQUALS$TOBJECT$$BOOLEAN
>> movb %al,%bl
>> .Lj197:
>> movb %bl,%al
>> ...
>>
>> In the above example, the parameter configuration prior to the two
>> CALL instructions are identical, so it can move these to before the
>> branching jump.
>>
>> However, some optimisations are not triggering because they expect a
>> jump or SETcc instruction to appear directly after a TEST
>> instruction, for example, and I can't just track the FLAGS register
>> because it has to check the condition that's being used too (e.g.
>> "MovAndTest2Test" requires the condition be C_E or C_NE).
>>
>> There are a couple of solutions to this:
>>
>> - Some instructions like those in the post-peephole stage could be
>> adapted to look ahead further for an appropriate instruction,
>> stopping if it finds one or if it finds another instruction that
>> modifies the flags. This will produce more complicated compiler code
>> though.
>>
>> - Have a flag that tells the compiler to run pass 1 again after pass
>> 2 (and have my common instruction optimisations in pass 2). This
>> would allow deeper optimisations but may cause significant slowdown
>> in the compiler, so I would only recommend this flag be honoured
>> under -O3 and -O4.
>>
>> I'm trying to weigh the pros and cons of each, not least because in
>> some cases, my common instruction optimisations aren't as efficient
>> in pass 2 because other pass 1 optimisations ensure the instructions
>> either side of the branch are no longer identical.
>>
>> Currently I'm seeing if I can avoid rerunning pass 1 and instead
>> improving the problematic optimisations to be more flexible with the
>> location of their SETcc and Jcc instructions.
>>
>> Gareth aka. Kit
>>
>>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list