[fpc-devel] LEA instruction speed

Tomas Hajny XHajT03 at hajny.biz
Wed Oct 11 10:21:45 CEST 2023


On 2023-10-11 04:15, J. Gareth Moreton via fpc-devel wrote:
> Sweet, thank you.  Would you be willing to share your modified test's
> source? I was worried that if CPUID wasn't present it would cause a
> SIGILL.

Sure, attached, but I didn't do anything special - I modified it in a 
way allowing easy disabling of this detection for x86 by disabling 
definition of a conditional symbol added to the source and I was 
prepared to recompile with the functionality disabled on the old AMD DX4 
if needed. However, I didn't need to do so - the AMD DX4 machine simply 
ignored it and chose the branch used in case of missing support for the 
particular CPUID function. I have no idea if this might be due to some 
protection in OS/2 Warp 4 (used for compiling and running the test on 
that machine) potentially masking that exception, or what was the 
reason. Apparently, it should be possible to detect CPUID availability 
(albeit not 100% reliably), see https://wiki.osdev.org/CPUID, but I 
didn't use that.

Tomas


> 
> On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
>> On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
>>> I'm all for receiving results for all kinds of processor, as it helps
>>> me to make more informed choices on flags as well as confirming that
>>> Agner Fog''s instruction tables are correct. Also, results for older
>>> processors can be hard to come by sometimes.
>>> 
>>> Currently, most architectures have a fast LEA, and the default
>>> "Athlon" option lines up with this.  Of the Intel architectures, the
>>> speed slows down on COREAVX onwards (COREI is fine), so I added a new
>>> COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
>>> the point where LEA is fast again (its 16-bit version is also fast,
>>> unlike Zen 3).
>>> 
>>> In the meantime I'll be looking at the benchmarking code that Stefan
>>> provided to see if it can and should be integrated.
>>> 
>>> Thanks again everyone for the results you're giving.
>> 
>> Alright, fine (I modified your test to include the CPU name as well if 
>> possible and added an IFDEFed distinction of 32-bits versus 64-bits):
>> 
>> 32-bits:
>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>> -----------------------------------------------------
>>    Pascal control case: 0.85 ns/call
>>  Using LEA instruction: 0.56 ns/call
>> Using ADD instructions: 0.84 ns/call
>> 
>> 64-bits:
>> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
>> -----------------------------------------------------
>>    Pascal control case: 0.85 ns/call
>>  Using LEA instruction: 0.56 ns/call
>> Using ADD instructions: 0.85 ns/call
>> 
>> 
>> 32-bits:
>> CPU = AMD Athlon(tm) Processor
>> ------------------------------
>>    Pascal control case: 6.10 ns/call
>>  Using LEA instruction: 3.40 ns/call
>> Using ADD instructions: 3.40 ns/call
>> 
>> 
>> 32-bits:
>> (AMD DX4 100 MHz - no CPUID name)
>>    Pascal control case: 123 ns/call
>>  Using LEA instruction: 72 ns/call
>> Using ADD instructions: 73 ns/call
>> 
>> Tomas
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: blea2.pp
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20231011/504d911b/attachment-0001.ksh>


More information about the fpc-devel mailing list