[fpc-devel] LEA instruction speed
J. Gareth Moreton
gareth at moreton-family.com
Fri Oct 27 16:02:44 CEST 2023
I should have figured. Thank you!
Kit
On 27/10/2023 01:51, Nikolay Nikolov via fpc-devel wrote:
>
> On 10/11/23 11:21, Tomas Hajny via fpc-devel wrote:
>> On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
>>> Sweet, thank you. Would you be willing to share your modified test's
>>> source? I was worried that if CPUID wasn't present it would cause a
>>> SIGILL.
>>
>> Sure, attached, but I didn't do anything special - I modified it in a
>> way allowing easy disabling of this detection for x86 by disabling
>> definition of a conditional symbol added to the source and I was
>> prepared to recompile with the functionality disabled on the old AMD
>> DX4 if needed. However, I didn't need to do so - the AMD DX4 machine
>> simply ignored it and chose the branch used in case of missing
>> support for the particular CPUID function. I have no idea if this
>> might be due to some protection in OS/2 Warp 4 (used for compiling
>> and running the test on that machine) potentially masking that
>> exception, or what was the reason. Apparently, it should be possible
>> to detect CPUID availability (albeit not 100% reliably), see
>> https://wiki.osdev.org/CPUID, but I didn't use that.
>
> There's CPUID support detection code in the Free Pascal RTL for i8086
> and i386. It's in unit cpu:
>
> function cpuid_support: boolean;
>
> Nikolay
>
>>
>> Tomas
>>
>>
>>>
>>> On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
>>>> On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
>>>>> I'm all for receiving results for all kinds of processor, as it helps
>>>>> me to make more informed choices on flags as well as confirming that
>>>>> Agner Fog''s instruction tables are correct. Also, results for older
>>>>> processors can be hard to come by sometimes.
>>>>>
>>>>> Currently, most architectures have a fast LEA, and the default
>>>>> "Athlon" option lines up with this. Of the Intel architectures, the
>>>>> speed slows down on COREAVX onwards (COREI is fine), so I added a new
>>>>> COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
>>>>> the point where LEA is fast again (its 16-bit version is also fast,
>>>>> unlike Zen 3).
>>>>>
>>>>> In the meantime I'll be looking at the benchmarking code that Stefan
>>>>> provided to see if it can and should be integrated.
>>>>>
>>>>> Thanks again everyone for the results you're giving.
>>>>
>>>> Alright, fine (I modified your test to include the CPU name as well
>>>> if possible and added an IFDEFed distinction of 32-bits versus
>>>> 64-bits):
>>>>
>>>> 32-bits:
>>>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>>>> -----------------------------------------------------
>>>> Pascal control case: 0.85 ns/call
>>>> Using LEA instruction: 0.56 ns/call
>>>> Using ADD instructions: 0.84 ns/call
>>>>
>>>> 64-bits:
>>>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>>>> -----------------------------------------------------
>>>> Pascal control case: 0.85 ns/call
>>>> Using LEA instruction: 0.56 ns/call
>>>> Using ADD instructions: 0.85 ns/call
>>>>
>>>>
>>>> 32-bits:
>>>> CPU = AMD Athlon(tm) Processor
>>>> ------------------------------
>>>> Pascal control case: 6.10 ns/call
>>>> Using LEA instruction: 3.40 ns/call
>>>> Using ADD instructions: 3.40 ns/call
>>>>
>>>>
>>>> 32-bits:
>>>> (AMD DX4 100 MHz - no CPUID name)
>>>> Pascal control case: 123 ns/call
>>>> Using LEA instruction: 72 ns/call
>>>> Using ADD instructions: 73 ns/call
>>>>
>>>> Tomas
>>
>> _______________________________________________
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
More information about the fpc-devel
mailing list