[fpc-devel] LEA instruction speed
Tomas Hajny
XHajT03 at hajny.biz
Sat Oct 7 17:51:40 CEST 2023
On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
Hi Kit,
> Do you think this should suffice? Originally it ran for 1,000,000
> repetitions but I fear that will take way too long on a 486, so I
> reduced it to 10,000.
OK, I tried it now. First of all, after turning on the old machine, I
realized that it wasn't Intel but AMD 486 DX4 - sorry for my bad memory.
:-( I compiled and ran the test under OS/2 there (I was too lazy to boot
it to DOS ;-) ), but I assume that it shouldn't make any substantial
difference. The ADD and LEA results were basically the same there, both
around 100 ns / call. The Pascal result was around twice as long.
Interestingly, the Pascal result for FPC 3.2.2 was around 10% longer
than the same source compiled with FPC 2.0.3 (the assembler versions
were obviously the same for both FPC versions; I tried compiling it also
with FPC 1.0.10 and the assembler versions were more than three times
slower due to missing support for the nostackframe directive).
I tested it under the AMD Athlon 1 GHz machine as well and again, the
results for LEA and ADD are basically equal (both 3.1 ns/call) and the
result for Pascal slightly more than twice (7.3 ns/call). However,
rather surprisingly for me, the overall test run was _much_ longer
there?! Finally, I tried compiling the test on a 64-bit machine (AMD
A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3 compiled from a
fresh 3.2 branch). The Pascal version shows about 4 ns/call, but the
assembler version runs forever - well, certainly much longer than my
patience lasts. I haven't tried to analyze the reasons, but that's what
I get.
Tomas
>
> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via fpc-devel"
>> <fpc-devel at lists.freepascal.org> wrote:
>>
>>
>> Hii Kit,
>>
>>> This is mainly to Florian, but also to anyone else who can answer the
>>> question - at which point did a complex LEA instruction (using all
>>> three input operands and some other specific circumstances) get
>>> slow? Preliminary research suggests the 486 was when it gained extra
>>> latency, and then Sandy Bridge when it got particularly bad. Icy
>>> Lake seems to be the architecture where faster LEA instructions are
>>> reintroduced, but I'm not sure about AMD processors.
>> I cannot answer your question, but if you prepare a test program, I
>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1 GHz machines
>> if it helps you in any way (at least I hope the 486 DX2 machine should
>> be still able to start ;-) ).
>>
>> Tomas
>>
>> _______________________________________________
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
More information about the fpc-devel
mailing list