[fpc-devel] LEA instruction speed

Sat Oct 7 17:51:40 CEST 2023

On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:

Hi Kit,

> Do you think this should suffice? Originally it ran for 1,000,000
> repetitions but I fear that will take way too long on a 486, so I
> reduced it to 10,000.

OK, I tried it now. First of all, after turning on the old machine, I 
realized that it wasn't Intel but AMD 486 DX4 - sorry for my bad memory. 
:-( I compiled and ran the test under OS/2 there (I was too lazy to boot 
it to DOS ;-) ), but I assume that it shouldn't make any substantial 
difference. The ADD and LEA results were basically the same there, both 
around 100 ns / call. The Pascal result was around twice as long. 
Interestingly, the Pascal result for FPC 3.2.2 was around 10% longer 
than the same source compiled with FPC 2.0.3 (the assembler versions 
were obviously the same for both FPC versions; I tried compiling it also 
with FPC 1.0.10 and the assembler versions were more than three times 
slower due to missing support for the nostackframe directive).

I tested it under the AMD Athlon 1 GHz machine as well and again, the 
results for LEA and ADD are basically equal (both 3.1 ns/call) and the 
result for Pascal slightly more than twice (7.3 ns/call). However, 
rather surprisingly for me, the overall test run was _much_ longer 
there?! Finally, I tried compiling the test on a 64-bit machine (AMD 
A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3 compiled from a 
fresh 3.2 branch). The Pascal version shows about 4 ns/call, but the 
assembler version runs forever - well, certainly much longer than my 
patience lasts. I haven't tried to analyze the reasons, but that's what 
I get.

Tomas

> 
> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via fpc-devel" 
>> <fpc-devel at lists.freepascal.org> wrote:
>> 
>> 
>> Hii Kit,
>> 
>>> This is mainly to Florian, but also to anyone else who can answer the 
>>> question - at which point did a complex LEA instruction (using all 
>>> three input operands and some other specific circumstances) get 
>>> slow?  Preliminary research suggests the 486 was when it gained extra 
>>> latency, and then Sandy Bridge when it got particularly bad.  Icy 
>>> Lake seems to be the architecture where faster LEA instructions are 
>>> reintroduced, but I'm not sure about AMD processors.
>> I cannot answer your question, but if you prepare a test program, I 
>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1 GHz machines 
>> if it helps you in any way (at least I hope the 486 DX2 machine should 
>> be still able to start ;-) ).
>> 
>> Tomas
>> 
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>> 
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel