[fpc-devel] LEA instruction speed
J. Gareth Moreton
gareth at moreton-family.com
Mon Oct 9 15:38:00 CEST 2023
Thank you for the report.
According to Agner Fog's table, complex LEA instructions should have a
3-cycle latency on that architecture (Haswell). Optimisations with this
instruction are proving interesting because there's such a variety
between processor architectures. There are some that are fine with 3
components, but slows right down if a scale factor is used.
Kit
On 09/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
> Hi Gareth
>
> model name : Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
>
> Regards
>
> Nataraj S Narayan
> Synergy Info Systems
> Software & Technology Consultants
> Ettumanoor, INDIA
> Ph:+91 9443211326
>
>
> On Sun, Oct 8, 2023 at 6:40 PM J. Gareth Moreton via fpc-devel
> <fpc-devel at lists.freepascal.org> wrote:
>
> Hi Nataraj
>
> Which processor is that run on? (although too close to call, it
> implies LEA has a latency of 2 in that case)
>
> Kit
>
> On 08/10/2023 14:06, Nataraj S Narayan via fpc-devel wrote:
>> Hi
>>
>> [nataraj at dflyHP ~]$ fpc ttt.pas
>> Free Pascal Compiler version 3.2.2 [2023/07/04] for x86_64
>> Copyright (c) 1993-2021 by Florian Klaempfl and others
>> Target OS: DragonFly for x86-64
>> Compiling ttt.pas
>> Linking ttt
>> /usr/local/bin/ld.bfd: warning:
>> /usr/local/lib/fpc/3.2.2/units/x86_64-dragonfly/rtl/prt0.o:
>> missing .note.GNU-stack section implies executable stack
>> /usr/local/bin/ld.bfd: NOTE: This behaviour is deprecated and
>> will be removed in a future version of the linker
>> 121 lines compiled, 14.9 sec
>> [nataraj at dflyHP ~]$ ./ttt
>> Pascal control case: 6.7 ns/call
>> Using LEA instruction: 4.2 ns/call
>> Using ADD instructions: 4.0 ns/call
>>
>>
>> Nataraj S Narayan
>> Synergy Info Systems
>> Software & Technology Consultants
>> Ettumanoor, INDIA
>> Ph:+91 9443211326
>>
>>
>> On Sat, Oct 7, 2023 at 9:39 PM J. Gareth Moreton via fpc-devel
>> <fpc-devel at lists.freepascal.org> wrote:
>>
>> That's interesting; I am interested to see the assembly
>> output for the
>> Pascal control cases. As for the 64-bit version, that was my
>> fault
>> since the assembly language is for Microsoft's ABI rather
>> than the
>> System V ABI, so it was checking a register with an undefined
>> value.
>> Find attached the fixed test.
>>
>> Kit
>>
>> P.S. Results on my Intel(R) Core(TM) i7-10750H
>>
>> Pascal control case: 2.0 ns/call
>> Using LEA instruction: 1.7 ns/call
>> Using ADD instructions: 1.3 ns/call
>>
>> On 07/10/2023 16:51, Tomas Hajny via fpc-devel wrote:
>> > On 2023-10-07 03:57, J. Gareth Moreton via fpc-devel wrote:
>> >
>> >
>> > Hi Kit,
>> >
>> >> Do you think this should suffice? Originally it ran for
>> 1,000,000
>> >> repetitions but I fear that will take way too long on a
>> 486, so I
>> >> reduced it to 10,000.
>> >
>> > OK, I tried it now. First of all, after turning on the old
>> machine, I
>> > realized that it wasn't Intel but AMD 486 DX4 - sorry for
>> my bad
>> > memory. :-( I compiled and ran the test under OS/2 there (I
>> was too
>> > lazy to boot it to DOS ;-) ), but I assume that it
>> shouldn't make any
>> > substantial difference. The ADD and LEA results were
>> basically the
>> > same there, both around 100 ns / call. The Pascal result
>> was around
>> > twice as long. Interestingly, the Pascal result for FPC
>> 3.2.2 was
>> > around 10% longer than the same source compiled with FPC
>> 2.0.3 (the
>> > assembler versions were obviously the same for both FPC
>> versions; I
>> > tried compiling it also with FPC 1.0.10 and the assembler
>> versions
>> > were more than three times slower due to missing support
>> for the
>> > nostackframe directive).
>> >
>> > I tested it under the AMD Athlon 1 GHz machine as well and
>> again, the
>> > results for LEA and ADD are basically equal (both 3.1
>> ns/call) and the
>> > result for Pascal slightly more than twice (7.3 ns/call).
>> However,
>> > rather surprisingly for me, the overall test run was _much_
>> longer
>> > there?! Finally, I tried compiling the test on a 64-bit
>> machine (AMD
>> > A9-9425) with Linux (compiled for 64-bits with FPC 3.2.3
>> compiled from
>> > a fresh 3.2 branch). The Pascal version shows about 4
>> ns/call, but the
>> > assembler version runs forever - well, certainly much
>> longer than my
>> > patience lasts. I haven't tried to analyze the reasons, but
>> that's
>> > what I get.
>> >
>> > Tomas
>> >
>> >
>> >
>> >>
>> >> On 03/10/2023 06:30, Tomas Hajny via fpc-devel wrote:
>> >>> On October 3, 2023 03:32:34 +0200, "J. Gareth Moreton via
>> fpc-devel"
>> >>> <fpc-devel at lists.freepascal.org> wrote:
>> >>>
>> >>>
>> >>> Hii Kit,
>> >>>
>> >>>> This is mainly to Florian, but also to anyone else who
>> can answer
>> >>>> the question - at which point did a complex LEA
>> instruction (using
>> >>>> all three input operands and some other specific
>> circumstances) get
>> >>>> slow? Preliminary research suggests the 486 was when it
>> gained
>> >>>> extra latency, and then Sandy Bridge when it got
>> particularly bad.
>> >>>> Icy Lake seems to be the architecture where faster LEA
>> instructions
>> >>>> are reintroduced, but I'm not sure about AMD processors.
>> >>> I cannot answer your question, but if you prepare a test
>> program, I
>> >>> can run it on an Intel 486 DX2 100 Mhz and AMD Athlon 1
>> GHz machines
>> >>> if it helps you in any way (at least I hope the 486 DX2
>> machine
>> >>> should be still able to start ;-) ).
>> >>>
>> >>> Tomas
>> >>>
>> >>> _______________________________________________
>> >>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> >>>
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>> >>>
>> >> _______________________________________________
>> >> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> >>
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>> > _______________________________________________
>> > fpc-devel maillist - fpc-devel at lists.freepascal.org
>> > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>> >_______________________________________________
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>>
>> _______________________________________________
>> fpc-devel maillist -fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
>
> _______________________________________________
> fpc-devel maillist -fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20231009/de71ad8b/attachment-0001.htm>
More information about the fpc-devel
mailing list