[fpc-devel] LEA instruction speed

J. Gareth Moreton gareth at moreton-family.com
Sun Oct 8 15:10:33 CEST 2023


Hi Nataraj

Which processor is that run on? (although too close to call, it implies 
LEA has a latency of 2 in that case)

Kit

On 08/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
> Hi
>
> [nataraj at dflyHP ~]$ fpc ttt.pas
> Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
> Copyright (c) 1993-2021 by Florian Klaempfl and others
> Target OS: DragonFly for x86-64
> Compiling ttt.pas
> Linking ttt
> /usr/local/bin/ld.bfd: warning: 
> /usr/local/lib/fpc/3.2.2/units/x86_64-dragonfly/rtl/prt0.o: missing 
> .note.GNU-stack section implies executable stack
> /usr/local/bin/ld.bfd: NOTE: This behaviour is deprecated and will be 
> removed in a future version of the linker
> 121 lines compiled, 14.9 sec
> [nataraj at dflyHP ~]$ ./ttt
>    Pascal control case: 6.7 ns/call
>  Using LEA instruction: 4.2 ns/call
> Using ADD instructions: 4.0 ns/call
>
>
> Nataraj S Narayan
> Synergy Info Systems
> Software & Technology Consultants
> Ettumanoor, INDIA
> Ph:+91 9443211326
>
>
> On Sat, Oct 7, 2023 at 9:39 PM J. Gareth Moreton via fpc-devel 
> <fpc-devel at lists.freepascal.org> wrote:
>
>     That's interesting; I am interested to see the assembly output for
>     the
>     Pascal control cases.  As for the 64-bit version, that was my fault
>     since the assembly language is for Microsoft's ABI rather than the
>     System V ABI, so it was checking a register with an undefined value.
>     Find attached the fixed test.
>
>     Kit
>
>     P.S. Results on my Intel(R) Core(TM) i7-10750H
>
>         Pascal control case: 2.0 ns/call
>       Using LEA instruction: 1.7 ns/call
>     Using ADD instructions: 1.3 ns/call
>
>     On 07/10/2023 16:51, Tomas Hajny via fpc-devel wrote:
>     > On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
>     >
>     >
>     > Hi Kit,
>     >
>     >> Do you think this should suffice? Originally it ran for 1,000,000
>     >> repetitions but I fear that will take way too long on a 486, so I
>     >> reduced it to 10,000.
>     >
>     > OK, I tried it now. First of all, after turning on the old
>     machine, I
>     > realized that it wasn't Intel but AMD 486 DX4 - sorry for my bad
>     > memory. :-( I compiled and ran the test under OS/2 there (I was too
>     > lazy to boot it to DOS ;-) ), but I assume that it shouldn't
>     make any
>     > substantial difference. The ADD and LEA results were basically the
>     > same there, both around 100 ns / call. The Pascal result was around
>     > twice as long. Interestingly, the Pascal result for FPC 3.2.2 was
>     > around 10% longer than the same source compiled with FPC 2.0.3 (the
>     > assembler versions were obviously the same for both FPC versions; I
>     > tried compiling it also with FPC 1.0.10 and the assembler versions
>     > were more than three times slower due to missing support for the
>     > nostackframe directive).
>     >
>     > I tested it under the AMD Athlon 1 GHz machine as well and
>     again, the
>     > results for LEA and ADD are basically equal (both 3.1 ns/call)
>     and the
>     > result for Pascal slightly more than twice (7.3 ns/call). However,
>     > rather surprisingly for me, the overall test run was _much_ longer
>     > there?! Finally, I tried compiling the test on a 64-bit machine
>     (AMD
>     > A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3
>     compiled from
>     > a fresh 3.2 branch). The Pascal version shows about 4 ns/call,
>     but the
>     > assembler version runs forever - well, certainly much longer
>     than my
>     > patience lasts. I haven't tried to analyze the reasons, but that's
>     > what I get.
>     >
>     > Tomas
>     >
>     >
>     >
>     >>
>     >> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
>     >>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via
>     fpc-devel"
>     >>> <fpc-devel at lists.freepascal.org> wrote:
>     >>>
>     >>>
>     >>> Hii Kit,
>     >>>
>     >>>> This is mainly to Florian, but also to anyone else who can
>     answer
>     >>>> the question - at which point did a complex LEA instruction
>     (using
>     >>>> all three input operands and some other specific
>     circumstances) get
>     >>>> slow? Preliminary research suggests the 486 was when it gained
>     >>>> extra latency, and then Sandy Bridge when it got particularly
>     bad.
>     >>>> Icy Lake seems to be the architecture where faster LEA
>     instructions
>     >>>> are reintroduced, but I'm not sure about AMD processors.
>     >>> I cannot answer your question, but if you prepare a test
>     program, I
>     >>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1 GHz
>     machines
>     >>> if it helps you in any way (at least I hope the 486 DX2 machine
>     >>> should be still able to start ;-) ).
>     >>>
>     >>> Tomas
>     >>>
>     >>> _______________________________________________
>     >>> fpc-devel maillist  - fpc-devel at lists.freepascal.org
>     >>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>     >>>
>     >> _______________________________________________
>     >> fpc-devel maillist  - fpc-devel at lists.freepascal.org
>     >> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>     > _______________________________________________
>     > fpc-devel maillist  - fpc-devel at lists.freepascal.org
>     > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>     >_______________________________________________
>     fpc-devel maillist  - fpc-devel at lists.freepascal.org
>     https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
>
> _______________________________________________
> fpc-devel maillist  -fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20231008/9e293a02/attachment.htm>


More information about the fpc-devel mailing list