[fpc-devel] LEA instruction speed
J. Gareth Moreton
gareth at moreton-family.com
Sun Oct 8 13:38:57 CEST 2023
Sorry, I got careless and was in a rush, as both the Pascal code is
wrong and I didn't store the result of the benchmark test, hence the
error check at the end returned a false negative.
The benchmark code was from Rika's SHA-1 test code, which I didn't
properly check, although I assumed the logic was to avoid counting the
time of the internal loop as much as possible. I should have gone with
my gut instinct and realised that wasn't the best method.
I've attached the updated test (now called "blea" as it's a benchmark
test) with your suggestions implemented, and an improved benchmarking
system. I'm not used to specifying parameters in place of registers -
I'm too used to needing total control!
Your results from experiments with adding additional ADD instructions is
expected, as LEA uses an AGU for computation, leaving the ALUs free for
other tasks (like ADD), so LEA is better even if speed is equal.
Kit
On 08/10/2023 11:06, Marģers . via fpc-devel wrote:
> 1. why you leave "time:=..." in benchmark loop? It does add 50% of
> execution time per call.
> 2. Pascal version does not match assembler version. Had to fix it.
> //Result := X + Counter + $87654321;
> Result:=Result + X + $87654321;
> Result:=Result xor y;
> 3. Assembler functions can be unified to work under win64,win32, linux
> 64, linux 32
> function Checksum_LEA(const Input, X, Y: LongWord): LongWord;
> assembler; nostackframe;
> asm
> @Loop2:
> LEA Input, [Input + X + $87654321]
> XOR Input, y
> DEC y
> JNZ @Loop2
> MOV EAX, Input
> end;
>
> 4. My results. Ryzen 2700x
>
> Pascal control case: 0.7 ns/call 0.0710
> Using LEA instruction: 0.7 ns/call 0.0700
> Using ADD instructions: 0.7 ns/call 0.0710
>
> Even thou results are equal, i was able to add 4 independent ADD
> instructions around LEA while results didn't chance, but only 2 around
> ADD.
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
{ %CPU=i386,x86_64 }
program blea;
{$IF not defined(CPUX86) and not defined(CPUX86_64)}
{$FATAL This test program requires an Intel x86 or x64 processor }
{$ENDIF}
{$MODE OBJFPC}
{$ASMMODE Intel}
uses
SysUtils;
type
TBenchmarkProc = function(const Input, X, Y: LongWord): LongWord;
function Checksum_PAS(const Input, X, Y: LongWord): LongWord;
var
Counter: LongWord;
begin
Result := Input;
Counter := Y;
while (Counter > 0) do
begin
Result := Result + X + $87654321;
Result := Result xor Counter;
Dec(Counter);
end;
end;
function Checksum_ADD(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop1:
ADD Input, $87654321
ADD Input, X
XOR Input, Y
DEC Y
JNZ @Loop1
MOV Result, Input
end;
function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop2:
LEA Input, [Input + X + $87654321]
XOR Input, Y
DEC Y
JNZ @Loop2
MOV EAX, ECX
end;
function Benchmark(const name: string; proc: TBenchmarkProc; Z, X: LongWord): LongWord;
const
internal_reps = 1000;
var
start: TDateTime;
time: double;
reps: cardinal;
begin
Result := Z;
reps := 0;
start := Now;
repeat
inc(reps);
Result := proc(Result, X, internal_reps);
until (reps >= 10000);
time := ((Now - start) * SecsPerDay) / reps / internal_reps * 1e9;
writeln(name, ': ', time:0:ord(time < 10), ' ns/call');
end;
var
Results: array[0..2] of LongWord;
FailureCode: Integer;
begin
Results[0] := Benchmark(' Pascal control case', @Checksum_PAS, 5000000, 1000);
Results[1] := Benchmark(' Using LEA instruction', @Checksum_LEA, 5000000, 1000);
Results[2] := Benchmark('Using ADD instructions', @Checksum_ADD, 5000000, 1000);
FailureCode := 0;
if (Results[0] <> Results[1]) then
begin
WriteLn('ERROR: Checksum_LEA doesn''t match control case');
FailureCode := FailureCode or 1;
end;
if (Results[0] <> Results[2]) then
begin
WriteLn('ERROR: Checksum_ADD doesn''t match control case');
FailureCode := FailureCode or 2
end;
if FailureCode <> 0 then
Halt(FailureCode);
end.
More information about the fpc-devel
mailing list