[fpc-devel] Finally fixed that MOVZX/SX optimisation!
J. Gareth Moreton
gareth at moreton-family.com
Fri Feb 21 00:35:19 CET 2020
Yes, it appeared under -O4. However, specifying -Oodeadstore caused
both instructions to be removed, but it makes sense because a call
followed those mov instructions, which sets %rax and, under
x86_64-win64, is not taken as a parameter (i.e. the value of %rax is
discarded upon calling a subroutine).
Thanks for pointing out where peephole optimisation is wasted and a
non-issue. I need to study nodes more! *scratches off mov/cmp checks!*
Just to note with the last optimisation over at #36687 that's been
giving me hassle until now, it deals mostly with constants that get
sign-extended or zero-extended. For example, in the same test, there
are sequences such as this:
... the patch now (correctly) changes that to "movl $-63,%eax".
Deadstore and lack of constant propagation isn't affected.
Gareth aka. Kit
On 20/02/2020 21:05, Florian Klämpfl wrote:
> Am 20.02.20 um 21:50 schrieb J. Gareth Moreton:
>> Oh, sorry, I made a slight error. The sequences only appear if you
>> specify -Oonoconstprop (and -a). So that sequence is produced with
>> "\pp\bin\x86_64-win64\ppcx64 -O4 -Oonoconstprop -a test\cg\tcnvint3b.pp"
> Then this is a non-issue.
>> There are still some inefficient combinations though in the assembly
>> - for example:
>> movl $61441,%eax
>> movw $61441,%ax
> This is with full -O3? Did you try to add -Oodeadstore?
>> Gareth aka. Kit
>> On 20/02/2020 20:45, J. Gareth Moreton wrote:
>>> On 20/02/2020 20:34, Florian Klämpfl wrote:
>>>> Am 20.02.20 um 21:25 schrieb J. Gareth Moreton:
>>>>> but if you run all of the "test/cg/tcnvint3" tests with the "-a"
>>>>> option, you will notice such sequences in some of the ".s" file.
>>>> With full -O3?
>>> Indeed so, with full -O4 even. When compiling
>>> "/test/cg/tcnvint3.pp" (a test that already exists) with -O4, we get
>>> things like this in the assembler dump - command line =
>>> "\pp\bin\x86_64-win64\ppcx64 -O4 test\cg\tcnvint3b.pp":
>>> # Peephole Optimization: movq $16711680,%rax -> movl $16711680,%eax
>>> (immediate can be represented with just 32 bits)
>>> movl $16711680,%eax
>>> cmpl $16711680,%eax
>>> je .Lj29
>>> call P$TCNVINT3_$$_FAIL
>>> jmp .Lj30
>>> .p2align 4,,10
>>> .p2align 3
>>> In this case, unless there's a freak CPU error, everything between
>>> "je .Lj29" and its destination label will never execute (if "je
>>> .Lj29" is changed to "jmp .Lj29", everything between them will be
>>> stripped by pass 1 of the peephole optimiser).
>>> Gareth aka. Kit
>>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
> fpc-devel maillist - fpc-devel at lists.freepascal.org
More information about the fpc-devel