[fpc-devel] LEA instruction speed
Nikolay Nikolov
nickysn at gmail.com
Fri Oct 27 02:51:11 CEST 2023
On 10/11/23 11:21, Tomas Hajny via fpc-devel wrote:
> On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
>> Sweet, thank you. Would you be willing to share your modified test's
>> source? I was worried that if CPUID wasn't present it would cause a
>> SIGILL.
>
> Sure, attached, but I didn't do anything special - I modified it in a
> way allowing easy disabling of this detection for x86 by disabling
> definition of a conditional symbol added to the source and I was
> prepared to recompile with the functionality disabled on the old AMD
> DX4 if needed. However, I didn't need to do so - the AMD DX4 machine
> simply ignored it and chose the branch used in case of missing support
> for the particular CPUID function. I have no idea if this might be due
> to some protection in OS/2 Warp 4 (used for compiling and running the
> test on that machine) potentially masking that exception, or what was
> the reason. Apparently, it should be possible to detect CPUID
> availability (albeit not 100% reliably), see
> https://wiki.osdev.org/CPUID, but I didn't use that.
There's CPUID support detection code in the Free Pascal RTL for i8086
and i386. It's in unit cpu:
function cpuid_support: boolean;
Nikolay
>
> Tomas
>
>
>>
>> On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
>>> On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
>>>> I'm all for receiving results for all kinds of processor, as it helps
>>>> me to make more informed choices on flags as well as confirming that
>>>> Agner Fog''s instruction tables are correct. Also, results for older
>>>> processors can be hard to come by sometimes.
>>>>
>>>> Currently, most architectures have a fast LEA, and the default
>>>> "Athlon" option lines up with this. Of the Intel architectures, the
>>>> speed slows down on COREAVX onwards (COREI is fine), so I added a new
>>>> COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
>>>> the point where LEA is fast again (its 16-bit version is also fast,
>>>> unlike Zen 3).
>>>>
>>>> In the meantime I'll be looking at the benchmarking code that Stefan
>>>> provided to see if it can and should be integrated.
>>>>
>>>> Thanks again everyone for the results you're giving.
>>>
>>> Alright, fine (I modified your test to include the CPU name as well
>>> if possible and added an IFDEFed distinction of 32-bits versus
>>> 64-bits):
>>>
>>> 32-bits:
>>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>>> -----------------------------------------------------
>>> Pascal control case: 0.85 ns/call
>>> Using LEA instruction: 0.56 ns/call
>>> Using ADD instructions: 0.84 ns/call
>>>
>>> 64-bits:
>>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>>> -----------------------------------------------------
>>> Pascal control case: 0.85 ns/call
>>> Using LEA instruction: 0.56 ns/call
>>> Using ADD instructions: 0.85 ns/call
>>>
>>>
>>> 32-bits:
>>> CPU = AMD Athlon(tm) Processor
>>> ------------------------------
>>> Pascal control case: 6.10 ns/call
>>> Using LEA instruction: 3.40 ns/call
>>> Using ADD instructions: 3.40 ns/call
>>>
>>>
>>> 32-bits:
>>> (AMD DX4 100 MHz - no CPUID name)
>>> Pascal control case: 123 ns/call
>>> Using LEA instruction: 72 ns/call
>>> Using ADD instructions: 73 ns/call
>>>
>>> Tomas
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
More information about the fpc-devel
mailing list