[fpc-devel] LEA instruction speed

J. Gareth Moreton gareth at moreton-family.com
Mon Oct 9 15:38:00 CEST 2023


Thank you for the report.

According to Agner Fog's table, complex LEA instructions should have a 
3-cycle latency on that architecture (Haswell). Optimisations with this 
instruction are proving interesting because there's such a variety 
between processor architectures. There are some that are fine with 3 
components, but slows right down if a scale factor is used.

Kit

On 09/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
> Hi Gareth
>
> model name : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
>
> Regards
>
> Nataraj S Narayan
> Synergy Info Systems
> Software & Technology Consultants
> Ettumanoor, INDIA
> Ph:+91 9443211326
>
>
> On Sun, Oct 8, 2023 at 6:40 PM J. Gareth Moreton via fpc-devel 
> <fpc-devel at lists.freepascal.org> wrote:
>
>     Hi Nataraj
>
>     Which processor is that run on? (although too close to call, it
>     implies LEA has a latency of 2 in that case)
>
>     Kit
>
>     On 08/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
>>     Hi
>>
>>     [nataraj at dflyHP ~]$ fpc ttt.pas
>>     Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
>>     Copyright (c) 1993-2021 by Florian Klaempfl and others
>>     Target OS: DragonFly for x86-64
>>     Compiling ttt.pas
>>     Linking ttt
>>     /usr/local/bin/ld.bfd: warning:
>>     /usr/local/lib/fpc/3.2.2/units/x86_64-dragonfly/rtl/prt0.o:
>>     missing .note.GNU-stack section implies executable stack
>>     /usr/local/bin/ld.bfd: NOTE: This behaviour is deprecated and
>>     will be removed in a future version of the linker
>>     121 lines compiled, 14.9 sec
>>     [nataraj at dflyHP ~]$ ./ttt
>>        Pascal control case: 6.7 ns/call
>>      Using LEA instruction: 4.2 ns/call
>>     Using ADD instructions: 4.0 ns/call
>>
>>
>>     Nataraj S Narayan
>>     Synergy Info Systems
>>     Software & Technology Consultants
>>     Ettumanoor, INDIA
>>     Ph:+91 9443211326
>>
>>
>>     On Sat, Oct 7, 2023 at 9:39 PM J. Gareth Moreton via fpc-devel
>>     <fpc-devel at lists.freepascal.org> wrote:
>>
>>         That's interesting; I am interested to see the assembly
>>         output for the
>>         Pascal control cases.  As for the 64-bit version, that was my
>>         fault
>>         since the assembly language is for Microsoft's ABI rather
>>         than the
>>         System V ABI, so it was checking a register with an undefined
>>         value.
>>         Find attached the fixed test.
>>
>>         Kit
>>
>>         P.S. Results on my Intel(R) Core(TM) i7-10750H
>>
>>             Pascal control case: 2.0 ns/call
>>           Using LEA instruction: 1.7 ns/call
>>         Using ADD instructions: 1.3 ns/call
>>
>>         On 07/10/2023 16:51, Tomas Hajny via fpc-devel wrote:
>>         > On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
>>         >
>>         >
>>         > Hi Kit,
>>         >
>>         >> Do you think this should suffice? Originally it ran for
>>         1,000,000
>>         >> repetitions but I fear that will take way too long on a
>>         486, so I
>>         >> reduced it to 10,000.
>>         >
>>         > OK, I tried it now. First of all, after turning on the old
>>         machine, I
>>         > realized that it wasn't Intel but AMD 486 DX4 - sorry for
>>         my bad
>>         > memory. :-( I compiled and ran the test under OS/2 there (I
>>         was too
>>         > lazy to boot it to DOS ;-) ), but I assume that it
>>         shouldn't make any
>>         > substantial difference. The ADD and LEA results were
>>         basically the
>>         > same there, both around 100 ns / call. The Pascal result
>>         was around
>>         > twice as long. Interestingly, the Pascal result for FPC
>>         3.2.2 was
>>         > around 10% longer than the same source compiled with FPC
>>         2.0.3 (the
>>         > assembler versions were obviously the same for both FPC
>>         versions; I
>>         > tried compiling it also with FPC 1.0.10 and the assembler
>>         versions
>>         > were more than three times slower due to missing support
>>         for the
>>         > nostackframe directive).
>>         >
>>         > I tested it under the AMD Athlon 1 GHz machine as well and
>>         again, the
>>         > results for LEA and ADD are basically equal (both 3.1
>>         ns/call) and the
>>         > result for Pascal slightly more than twice (7.3 ns/call).
>>         However,
>>         > rather surprisingly for me, the overall test run was _much_
>>         longer
>>         > there?! Finally, I tried compiling the test on a 64-bit
>>         machine (AMD
>>         > A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3
>>         compiled from
>>         > a fresh 3.2 branch). The Pascal version shows about 4
>>         ns/call, but the
>>         > assembler version runs forever - well, certainly much
>>         longer than my
>>         > patience lasts. I haven't tried to analyze the reasons, but
>>         that's
>>         > what I get.
>>         >
>>         > Tomas
>>         >
>>         >
>>         >
>>         >>
>>         >> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
>>         >>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via
>>         fpc-devel"
>>         >>> <fpc-devel at lists.freepascal.org> wrote:
>>         >>>
>>         >>>
>>         >>> Hii Kit,
>>         >>>
>>         >>>> This is mainly to Florian, but also to anyone else who
>>         can answer
>>         >>>> the question - at which point did a complex LEA
>>         instruction (using
>>         >>>> all three input operands and some other specific
>>         circumstances) get
>>         >>>> slow? Preliminary research suggests the 486 was when it
>>         gained
>>         >>>> extra latency, and then Sandy Bridge when it got
>>         particularly bad.
>>         >>>> Icy Lake seems to be the architecture where faster LEA
>>         instructions
>>         >>>> are reintroduced, but I'm not sure about AMD processors.
>>         >>> I cannot answer your question, but if you prepare a test
>>         program, I
>>         >>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1
>>         GHz machines
>>         >>> if it helps you in any way (at least I hope the 486 DX2
>>         machine
>>         >>> should be still able to start ;-) ).
>>         >>>
>>         >>> Tomas
>>         >>>
>>         >>> _______________________________________________
>>         >>> fpc-devel maillist  - fpc-devel at lists.freepascal.org
>>         >>>
>>         https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>         >>>
>>         >> _______________________________________________
>>         >> fpc-devel maillist  - fpc-devel at lists.freepascal.org
>>         >>
>>         https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>         > _______________________________________________
>>         > fpc-devel maillist  - fpc-devel at lists.freepascal.org
>>         > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>         >_______________________________________________
>>         fpc-devel maillist  - fpc-devel at lists.freepascal.org
>>         https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>>
>>     _______________________________________________
>>     fpc-devel maillist  -fpc-devel at lists.freepascal.org
>>     https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>     _______________________________________________
>     fpc-devel maillist  - fpc-devel at lists.freepascal.org
>     https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
>
> _______________________________________________
> fpc-devel maillist  -fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20231009/de71ad8b/attachment-0001.htm>


More information about the fpc-devel mailing list