[fpc-devel] LEA instruction speed
J. Gareth Moreton
gareth at moreton-family.com
Sat Oct 7 03:57:33 CEST 2023
Hi Tomas,
Do you think this should suffice? Originally it ran for 1,000,000
repetitions but I fear that will take way too long on a 486, so I
reduced it to 10,000.
Kit
On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via fpc-devel" <fpc-devel at lists.freepascal.org> wrote:
>
>
> Hii Kit,
>
>> This is mainly to Florian, but also to anyone else who can answer the question - at which point did a complex LEA instruction (using all three input operands and some other specific circumstances) get slow? Preliminary research suggests the 486 was when it gained extra latency, and then Sandy Bridge when it got particularly bad. Icy Lake seems to be the architecture where faster LEA instructions are reintroduced, but I'm not sure about AMD processors.
> I cannot answer your question, but if you prepare a test program, I can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1 GHz machines if it helps you in any way (at least I hope the 486 DX2 machine should be still able to start ;-) ).
>
> Tomas
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
-------------- next part --------------
program leatest;
{$MODE OBJFPC}
{$ASMMODE Intel}
uses
SysUtils;
type
TBenchmarkProc = function(const Input, X, Y: LongWord): LongWord;
function Checksum_PAS(const Input, X, Y: LongWord): LongWord;
var
Counter: LongWord;
begin
Result := Input;
Counter := Y;
while (Counter > 0) do
begin
Result := X + Counter + $87654321;
Dec(Counter);
end;
end;
function Checksum_ADD(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop1:
{$ifdef CPUX86_64}
ADD ECX, $87654321
ADD ECX, EDX
XOR ECX, R8D
DEC R8D
JNZ @Loop1
MOV EAX, ECX
{$else CPUX86_64}
ADD EAX, $87654321
ADD EAX, EDX
XOR EAX, ECX
DEC ECX
JNZ @Loop1
{$endif CPUX86_64}
end;
function Checksum_LEA(const Input, X, Y: LongWord): LongWord; assembler; nostackframe;
asm
@Loop2:
{$ifdef CPUX86_64}
LEA ECX, [ECX + EDX + $87654321]
XOR ECX, R8D
DEC R8D
JNZ @Loop2
MOV EAX, ECX
{$else CPUX86_64}
LEA EAX, [EAX + EDX + $87654321]
XOR EAX, ECX
DEC ECX
JNZ @Loop2
{$endif CPUX86_64}
end;
function Benchmark(const name: string; proc: TBenchmarkProc; Z, X: LongWord): LongWord;
const
internal_reps = 1000;
var
start: TDateTime;
time: double;
reps: cardinal;
begin
Result := Z;
reps := 0;
start := Now;
repeat
inc(reps);
proc(Result, X, internal_reps);
time := (Now - start) * SecsPerDay;
until (reps >= 10000);
time := time / reps / internal_reps * 1e9;
writeln(name, ': ', time:0:ord(time < 10), ' ns/call');
end;
var
Results: array[0..2] of LongWord;
FailureCode: Integer;
begin
Results[0] := Benchmark(' Pascal control case', @Checksum_PAS, 5000000, 1000);
Results[1] := Benchmark(' Using LEA instruction', @Checksum_LEA, 5000000, 1000);
Results[2] := Benchmark('Using ADD instructions', @Checksum_ADD, 5000000, 1000);
FailureCode := 0;
if (Results[0] <> Results[1]) then
begin
WriteLn('ERROR: Checksum_LEA doesn''t match control case');
FailureCode := FailureCode or 1;
end;
if (Results[0] <> Results[2]) then
begin
WriteLn('ERROR: Checksum_ADD doesn''t match control case');
FailureCode := FailureCode or 2
end;
if FailureCode <> 0 then
Halt(FailureCode);
end.
More information about the fpc-devel
mailing list