[fpc-devel] LEA instruction speed

J. Gareth Moreton gareth at moreton-family.com
Sun Oct 8 13:38:57 CEST 2023


Sorry, I got careless and was in a rush, as both the Pascal code is 
wrong and I didn't store the result of the benchmark test, hence the 
error check at the end returned a false negative.

The benchmark code was from Rika's SHA-1 test code, which I didn't 
properly check, although I assumed the logic was to avoid counting the 
time of the internal loop as much as possible.  I should have gone with 
my gut instinct and realised that wasn't the best method.

I've attached the updated test (now called "blea" as it's a benchmark 
test) with your suggestions implemented, and an improved benchmarking 
system.  I'm not used to specifying parameters in place of registers - 
I'm too used to needing total control!

Your results from experiments with adding additional ADD instructions is 
expected, as LEA uses an AGU for computation, leaving the ALUs free for 
other tasks (like ADD), so LEA is better even if speed is equal.

Kit

On 08/10/2023 11:06, Marģers . via fpc-devel wrote:
> 1. why you leave "time:=..." in benchmark loop? It does add 50% of 
> execution time per call.
> 2. Pascal version does not match assembler version. Had to fix it.
>       //Result := X + Counter + $87654321;
>       Result:=Result + X + $87654321;
>       Result:=Result xor y;
> 3. Assembler functions can be unified to work under win64,win32, linux 
> 64, linux 32
> function Checksum_LEA(const Input, X, Y: LongWord): LongWord; 
> assembler; nostackframe;
> asm
> @Loop2:
>   LEA Input, [Input + X + $87654321]
>   XOR Input, y
>   DEC y
>   JNZ @Loop2
>   MOV EAX, Input
> end;
>
> 4. My results. Ryzen 2700x
>
>    Pascal control case: 0.7 ns/call  0.0710
>  Using LEA instruction: 0.7 ns/call  0.0700
> Using ADD instructions: 0.7 ns/call  0.0710
>
> Even thou results are equal, i was able to add 4 independent ADD 
> instructions around LEA while results didn't chance, but only 2 around 
> ADD.
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
{ %CPU=i386,x86_64 }
program blea;

{$IF not defined(CPUX86) and not defined(CPUX86_64)}
  {$FATAL This test program requires an Intel x86 or x64 processor }
{$ENDIF}

{$MODE OBJFPC}
{$ASMMODE Intel}

uses
  SysUtils;
  
type
  TBenchmarkProc = function(const Input, X, Y: LongWord): LongWord;
 

function Checksum_PAS(const Input, X, Y: LongWord): LongWord;
var
  Counter: LongWord;
begin
  Result := Input;
  Counter := Y;
  while (Counter > 0) do
    begin
      Result := Result + X + $87654321;
      Result := Result xor Counter;
      Dec(Counter);
    end;
end;

function Checksum_ADD(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop1:
  ADD Input, $87654321
  ADD Input, X
  XOR Input, Y
  DEC Y
  JNZ @Loop1
  MOV Result, Input
end;

function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop2:
  LEA Input, [Input + X + $87654321]
  XOR Input, Y
  DEC Y
  JNZ @Loop2
  MOV EAX, ECX
end;

function Benchmark(const name: string; proc: TBenchmarkProc; Z, X: LongWord): LongWord;
const
  internal_reps = 1000;
var
  start: TDateTime;
  time: double;
  reps: cardinal;
begin
  Result := Z;
  reps := 0;
  start := Now;
  repeat
    inc(reps);
    Result := proc(Result, X, internal_reps);
  until (reps >= 10000);
  time := ((Now - start) * SecsPerDay) / reps / internal_reps * 1e9;
  writeln(name, ': ', time:0:ord(time < 10), ' ns/call');
end;

var
  Results: array[0..2] of LongWord;
  FailureCode: Integer;
begin
  Results[0] := Benchmark('   Pascal control case', @Checksum_PAS, 5000000, 1000);
  Results[1] := Benchmark(' Using LEA instruction', @Checksum_LEA, 5000000, 1000);
  Results[2] := Benchmark('Using ADD instructions', @Checksum_ADD, 5000000, 1000);
  
  FailureCode := 0;

  if (Results[0] <> Results[1]) then
    begin
      WriteLn('ERROR: Checksum_LEA doesn''t match control case');
      FailureCode := FailureCode or 1;
    end;
  if (Results[0] <> Results[2]) then
    begin
      WriteLn('ERROR: Checksum_ADD doesn''t match control case');
      FailureCode := FailureCode or 2
    end;
    
  if FailureCode <> 0 then
    Halt(FailureCode);
end.


More information about the fpc-devel mailing list