[fpc-devel] LEA instruction speed

Nikolay Nikolov nickysn at gmail.com
Fri Oct 27 02:51:11 CEST 2023


On 10/11/23 11:21, Tomas Hajny via fpc-devel wrote:
> On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
>> Sweet, thank you.  Would you be willing to share your modified test's
>> source? I was worried that if CPUID wasn't present it would cause a
>> SIGILL.
>
> Sure, attached, but I didn't do anything special - I modified it in a 
> way allowing easy disabling of this detection for x86 by disabling 
> definition of a conditional symbol added to the source and I was 
> prepared to recompile with the functionality disabled on the old AMD 
> DX4 if needed. However, I didn't need to do so - the AMD DX4 machine 
> simply ignored it and chose the branch used in case of missing support 
> for the particular CPUID function. I have no idea if this might be due 
> to some protection in OS/2 Warp 4 (used for compiling and running the 
> test on that machine) potentially masking that exception, or what was 
> the reason. Apparently, it should be possible to detect CPUID 
> availability (albeit not 100% reliably), see 
> https://wiki.osdev.org/CPUID, but I didn't use that.

There's CPUID support detection code in the Free Pascal RTL for i8086 
and i386. It's in unit cpu:

function cpuid_support: boolean;

Nikolay

>
> Tomas
>
>
>>
>> On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
>>> On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
>>>> I'm all for receiving results for all kinds of processor, as it helps
>>>> me to make more informed choices on flags as well as confirming that
>>>> Agner Fog''s instruction tables are correct. Also, results for older
>>>> processors can be hard to come by sometimes.
>>>>
>>>> Currently, most architectures have a fast LEA, and the default
>>>> "Athlon" option lines up with this.  Of the Intel architectures, the
>>>> speed slows down on COREAVX onwards (COREI is fine), so I added a new
>>>> COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
>>>> the point where LEA is fast again (its 16-bit version is also fast,
>>>> unlike Zen 3).
>>>>
>>>> In the meantime I'll be looking at the benchmarking code that Stefan
>>>> provided to see if it can and should be integrated.
>>>>
>>>> Thanks again everyone for the results you're giving.
>>>
>>> Alright, fine (I modified your test to include the CPU name as well 
>>> if possible and added an IFDEFed distinction of 32-bits versus 
>>> 64-bits):
>>>
>>> 32-bits:
>>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>>> -----------------------------------------------------
>>>    Pascal control case: 0.85 ns/call
>>>  Using LEA instruction: 0.56 ns/call
>>> Using ADD instructions: 0.84 ns/call
>>>
>>> 64-bits:
>>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>>> -----------------------------------------------------
>>>    Pascal control case: 0.85 ns/call
>>>  Using LEA instruction: 0.56 ns/call
>>> Using ADD instructions: 0.85 ns/call
>>>
>>>
>>> 32-bits:
>>> CPU = AMD Athlon(tm) Processor
>>> ------------------------------
>>>    Pascal control case: 6.10 ns/call
>>>  Using LEA instruction: 3.40 ns/call
>>> Using ADD instructions: 3.40 ns/call
>>>
>>>
>>> 32-bits:
>>> (AMD DX4 100 MHz - no CPUID name)
>>>    Pascal control case: 123 ns/call
>>>  Using LEA instruction: 72 ns/call
>>> Using ADD instructions: 73 ns/call
>>>
>>> Tomas
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


More information about the fpc-devel mailing list