[fpc-devel] Finally fixed that MOVZX/SX optimisation!

Fri Feb 21 00:35:19 CET 2020

Yes, it appeared under -O4.  However, specifying -Oodeadstore caused 
both instructions to be removed, but it makes sense because a call 
followed those mov instructions, which sets %rax and, under 
x86_64-win64, is not taken as a parameter (i.e. the value of %rax is 
discarded upon calling a subroutine).

Thanks for pointing out where peephole optimisation is wasted and a 
non-issue.  I need to study nodes more!  *scratches off mov/cmp checks!*

Just to note with the last optimisation over at #36687 that's been 
giving me hassle until now, it deals mostly with constants that get 
sign-extended or zero-extended.  For example, in the same test, there 
are sequences such as this:

     movb    $-63,%al
     movsbl    %al,%eax

... the patch now (correctly) changes that to "movl $-63,%eax". 
Deadstore and lack of constant propagation isn't affected.

Gareth aka. Kit

On 20/02/2020 21:05, Florian Klämpfl wrote:
> Am 20.02.20 um 21:50 schrieb J. Gareth Moreton:
>> Oh, sorry, I made a slight error.  The sequences only appear if you 
>> specify -Oonoconstprop (and -a). So that sequence is produced with 
>> "\pp\bin\x86_64-win64\ppcx64 -O4 -Oonoconstprop -a test\cg\tcnvint3b.pp"
>
> Then this is a non-issue.
>
>>
>> There are still some inefficient combinations though in the assembly 
>> - for example:
>>
>>      movl    $61441,%eax
>>      movw    $61441,%ax
>
> This is with full -O3? Did you try to add -Oodeadstore?
>
>>
>> Gareth aka. Kit
>>
>> On 20/02/2020 20:45, J. Gareth Moreton wrote:
>>> On 20/02/2020 20:34, Florian Klämpfl wrote:
>>>> Am 20.02.20 um 21:25 schrieb J. Gareth Moreton:
>>>>> but if you run all of the "test/cg/tcnvint3" tests with the "-a" 
>>>>> option, you will notice such sequences in some of the ".s" file.
>>>>
>>>> With full -O3?
>>>
>>> Indeed so, with full -O4 even.  When compiling 
>>> "/test/cg/tcnvint3.pp" (a test that already exists) with -O4, we get 
>>> things like this in the assembler dump - command line = 
>>> "\pp\bin\x86_64-win64\ppcx64 -O4 test\cg\tcnvint3b.pp":
>>>
>>> # Peephole Optimization: movq $16711680,%rax -> movl $16711680,%eax 
>>> (immediate can be represented with just 32 bits)
>>>     movl    $16711680,%eax
>>>     cmpl    $16711680,%eax
>>>     je    .Lj29
>>>     call    P$TCNVINT3_$$_FAIL
>>>     jmp    .Lj30
>>>     .p2align 4,,10
>>>     .p2align 3
>>> .Lj29:
>>>
>>> In this case, unless there's a freak CPU error, everything between 
>>> "je .Lj29" and its destination label will never execute (if "je 
>>> .Lj29" is changed to "jmp .Lj29", everything between them will be 
>>> stripped by pass 1 of the peephole optimiser).
>>>
>>> Gareth aka. Kit
>>>
>>> _______________________________________________
>>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>>
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>