[fpc-devel] LEA instruction speed

J. Gareth Moreton gareth at moreton-family.com
Wed Oct 11 04:15:48 CEST 2023


Sweet, thank you.  Would you be willing to share your modified test's 
source? I was worried that if CPUID wasn't present it would cause a SIGILL.

Kit

On 11/10/2023 01:47, Tomas Hajny via fpc-devel wrote:
> On 2023-10-10 13:24, J. Gareth Moreton via fpc-devel wrote:
>> I'm all for receiving results for all kinds of processor, as it helps
>> me to make more informed choices on flags as well as confirming that
>> Agner Fog''s instruction tables are correct. Also, results for older
>> processors can be hard to come by sometimes.
>>
>> Currently, most architectures have a fast LEA, and the default
>> "Athlon" option lines up with this.  Of the Intel architectures, the
>> speed slows down on COREAVX onwards (COREI is fine), so I added a new
>> COREX (for 10th generation Core) option between ZEN2 and ZEN3 to mark
>> the point where LEA is fast again (its 16-bit version is also fast,
>> unlike Zen 3).
>>
>> In the meantime I'll be looking at the benchmarking code that Stefan
>> provided to see if it can and should be integrated.
>>
>> Thanks again everyone for the results you're giving.
>
> Alright, fine (I modified your test to include the CPU name as well if 
> possible and added an IFDEFed distinction of 32-bits versus 64-bits):
>
> 32-bits:
> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
> -----------------------------------------------------
>    Pascal control case: 0.85 ns/call
>  Using LEA instruction: 0.56 ns/call
> Using ADD instructions: 0.84 ns/call
>
> 64-bits:
> CPU = AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G
> -----------------------------------------------------
>    Pascal control case: 0.85 ns/call
>  Using LEA instruction: 0.56 ns/call
> Using ADD instructions: 0.85 ns/call
>
>
> 32-bits:
> CPU = AMD Athlon(tm) Processor
> ------------------------------
>    Pascal control case: 6.10 ns/call
>  Using LEA instruction: 3.40 ns/call
> Using ADD instructions: 3.40 ns/call
>
>
> 32-bits:
> (AMD DX4 100 MHz - no CPUID name)
>    Pascal control case: 123 ns/call
>  Using LEA instruction: 72 ns/call
> Using ADD instructions: 73 ns/call
>
> Tomas
>
>
>
>>
>> On 10/10/2023 11:54, Tomas Hajny via fpc-devel wrote:
>>> On 2023-10-10 12:19, Marco van de Voort via fpc-devel wrote:
>>>> Op 10-10-2023 om 11:13 schreef J. Gareth Moreton via fpc-devel:
>>>>> Thanks Tomas,
>>>>>
>>>>> Nothing is broken, but the timing measurement isn't precise enough.
>>>>>
>>>>> Normally I have a much higher iteration count (e.g. 1,000,000), 
>>>>> but I had reduced it to 10,000 because, coupled with the 1,000 
>>>>> iterations in the subroutines themselves, would have led to 
>>>>> 1,000,000,000 passes and hence would take in the region of five to 
>>>>> ten minutes to complete for a 16 MHz 386, for example.  Rika's 
>>>>> suggestion of running as many iterations as needed until, say, 5 
>>>>> seconds elapses, would help but the timing measurements would 
>>>>> cause a lot of latency and will be imprecise on very slow 
>>>>> routines.  Still, let's see if 100,000 gives better results for you.
>>>>>
>>>> I had the same problem, and now it is stable  Ryzen 5700X (ZEN3)
>>>>
>>>>    Pascal control case: 0.7 ns/call
>>>>  Using LEA instruction: 0.4 ns/call
>>>> Using ADD instructions: 0.7 ns/call
>>>
>>> Indeed, it's much more consistent now, attached a new log for both 
>>> 32-bit and 64-bit versions from the Intel machine with Windows. 
>>> Apparently, ADD is still somewhat faster on such "newer" Intel 
>>> machines (at least if not considering the potential parallelism of 
>>> LEA discussed previously). I can try this version on my AMD machines 
>>> later tonight if considered useful - please, let me know which 
>>> results would be relevant for you in that case (out of the ancient 
>>> AMD DX4, only slightly less ancient AMD Athlon 1 GHz and the still 
>>> rather reasonable AMD A9).
>>>
>>> Tomas
>>>
>>> _______________________________________________
>>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>


More information about the fpc-devel mailing list