[fpc-devel] Producing assembly with less branches?

Stefan Glienke sglienke at dsharp.org
Mon Jul 20 03:48:16 CEST 2020


Am 20.07.2020 um 02:37 schrieb J. Gareth Moreton:
>
> On 19/07/2020 22:37, Stefan Glienke wrote:
>> clang and gcc emit this - I would guess they detect quite some common 
>> patterns like this.
>>
>>  ...
>>   cmp     eax, edx
>>   mov     edx, -1
>>   setg    al
>>   movzx   eax, al
>>   cmovl   eax, edx
>>   ret
>
> I think I can make improvements to that already! (Note the sequence 
> above and below are in Intel notation)
>
> CMP   EAX, EDX
> MOV   EAX, 0 ; Note: don't use XOR EAX, EAX because this scrambles the 
> FLAGS register
> MOV   EDX, -1
> SETG   AL
> CMOVL EAX, EDX
> RET
>
> I believe that executes one cycle faster (20% faster for the entire 
> sequence) on modern processors because it shortens the dependency 
> chain that exists between "SETG AL; MOVZX EAX, AL; CMOVL EAX, EDX". It 
> might require some testing though to be sure.

That is what clang does (the first snippet I posted) by using ecx for 
the 0 and it does so with the shorter xor before the cmp which results 
in 16bytes of code - gcc is 17, yours 19.

Anyhow they all don't differ in execution speed but are 2.5 times faster 
than the double cmp and cond jump galore ;)


-- 
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list