[fpc-devel] Producing assembly with less branches?
Stefan Glienke
sglienke at dsharp.org
Mon Jul 20 03:48:16 CEST 2020
Am 20.07.2020 um 02:37 schrieb J. Gareth Moreton:
>
> On 19/07/2020 22:37, Stefan Glienke wrote:
>> clang and gcc emit this - I would guess they detect quite some common
>> patterns like this.
>>
>> ...
>> cmp eax, edx
>> mov edx, -1
>> setg al
>> movzx eax, al
>> cmovl eax, edx
>> ret
>
> I think I can make improvements to that already! (Note the sequence
> above and below are in Intel notation)
>
> CMP EAX, EDX
> MOV EAX, 0 ; Note: don't use XOR EAX, EAX because this scrambles the
> FLAGS register
> MOV EDX, -1
> SETG AL
> CMOVL EAX, EDX
> RET
>
> I believe that executes one cycle faster (20% faster for the entire
> sequence) on modern processors because it shortens the dependency
> chain that exists between "SETG AL; MOVZX EAX, AL; CMOVL EAX, EDX". It
> might require some testing though to be sure.
That is what clang does (the first snippet I posted) by using ecx for
the 0 and it does so with the shorter xor before the cmp which results
in 16bytes of code - gcc is 17, yours 19.
Anyhow they all don't differ in execution speed but are 2.5 times faster
than the double cmp and cond jump galore ;)
--
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list