[fpc-devel] x86: Efficiency of opposing CMOVs

Florian Klämpfl florian at freepascal.org
Sat Apr 16 11:18:53 CEST 2022



> Am 16.04.2022 um 06:49 schrieb J. Gareth Moreton via fpc-devel <fpc-devel at lists.freepascal.org>:
> 
> Hi everyone,
> 
> In the x86_64 assembly dumps, I frequently come across combinations such as the following:
> 
>     cmpl    %ebx,%edx
>     cmovll    %ebx,%eax
>     cmovnll    %edx,%eax
> 
> This is essentially the tertiary C operator "x = cond ? trueval : falseval", or in Pascal "if (cond) then x := trueval else x := falseval;".  However, because the CMOV instructions have exact opposite conditions, is it better to optimise it into this?
> 
>     movl    %ebx,%eax
>     cmpl    %ebx,%edx
>     cmovnll    %edx,%eax
> 
> It's smaller, but is it actually faster (or the same speed)?  At the very least, the two CMOV instructions depend on the CMP instruction being completed, but I'm not sure if the second CMOV depends on the first one being evaluated (because of %eax).  With the second block of code, the MOV and CMP instructions can execute simultaneously.
> 
> My educated guess tells me that MOV/CMP/CMOV(~c) is faster than CMP/CMOVc/CMOV(~c), but I haven't been able to find an authoritive source on this yet.

cmov is normally slow, so the latter should be slower, a brief test shows this also.

$ cat tbench1.pp


procedure p;
var
  a,b,c : array[0..100] of longint;
  i,j,e,f,g : longint;
begin
    for j:=low(a) to high(a) do
      begin
        a[j]:=random(10);
        b[j]:=random(10);
      end;
    for i:=1 to 10000000 do
      for j:=low(a) to high(a) do
        begin
          e:=a[j];
          f:=b[j];
          g:=e;
          if e<f then
            g:=f;
          c[j]:=g;
        end;
end;

begin
  p;
end.

$ time ./tbench1

real	0m0.752s
user	0m0.748s
sys	0m0.004s


$ cat tbench2.pp
procedure p;
var
  a,b,c : array[0..100] of longint;
  i,j,e,f,g : longint;
begin
    for j:=low(a) to high(a) do
      begin
        a[j]:=random(10);
        b[j]:=random(10);
      end;
    for i:=1 to 10000000 do
      for j:=low(a) to high(a) do
        begin
          e:=a[j];
          f:=b[j];
          if e<f then
            g:=f
          else
            g:=e;
          c[j]:=g;
        end;
end;

begin
  p;
end.


$ time ./tbench2

real	0m0.997s
user	0m0.997s
sys	0m0.000s

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20220416/361ec500/attachment-0001.htm>


More information about the fpc-devel mailing list