[fpc-devel] Producing assembly with less branches?
J. Gareth Moreton
gareth at moreton-family.com
Mon Jul 20 02:37:37 CEST 2020
On 19/07/2020 22:37, Stefan Glienke wrote:
> clang and gcc emit this - I would guess they detect quite some common
> patterns like this.
>
> ...
> cmp eax, edx
> mov edx, -1
> setg al
> movzx eax, al
> cmovl eax, edx
> ret
I think I can make improvements to that already! (Note the sequence
above and below are in Intel notation)
CMP EAX, EDX
MOV EAX, 0 ; Note: don't use XOR EAX, EAX because this scrambles the
FLAGS register
MOV EDX, -1
SETG AL
CMOVL EAX, EDX
RET
I believe that executes one cycle faster (20% faster for the entire
sequence) on modern processors because it shortens the dependency chain
that exists between "SETG AL; MOVZX EAX, AL; CMOVL EAX, EDX". It might
require some testing though to be sure.
The difficulties with CMOV is that it can only write to registers (and
not 8-bit ones) and can read from memory addresses, but not write to
them. If there are registers free at that point in the code though, one
could potentially write the constants to temporary registers beforehand,
and then assign them to the registers that matter via CMOV (e.g. as
shown above with the -1 value).
I'm all for improving the generated assembly language where I can.
There are some traps that one has to be careful of though, usually
involving false dependencies. For example, when setting registers to
-1, some compilers would use "OR EAX, -1" instead of "MOV EAX, -1" on
account of it taking fewer bytes to encode. Both Visual C++ and GCC did
this at one point, but this causes a false dependency with the previous
value of EAX so would incur a performance penalty.
The final thing to remember is that, by default, i386 will produce code
that will run on the oldest 80386 processors. CMOV was only introduced
with the Intel Pentium Pro in 1995. If compiling for x86_64, or if you
specify compiler parameters to set the minimum processor support, then
CMOV will be used.
(It also just made me realise that Pass 2 of the peephole optimiser
would not work with virtual registers because of CMOV's restriction in
that it can't write to memory addresses, including the stack)
Gareth aka. Kit
More information about the fpc-devel
mailing list