[fpc-devel] LEA instruction speed

Wed Oct 11 07:10:25 CEST 2023

The LEA and ADD times are close enough that I can consider them 
identical.  And Braswell (the architecture behind that brand of Celeron) 
doesn't support AVX, I don't think, so that lines up with COREI having a 
fast LEA instruction but not COREAVX.

Given the many different x86-compatible CPUs, I wonder if we need to 
document the best compiler parameters for end users in some way (e.g. so 
it can be coded in a device driver installer so the most optimised 
binary can be installed for a given CPU architecture).

Kit

On 11/10/2023 05:56, Christo Crause wrote:
> On Tue, Oct 10, 2023 at 11:13 AM J. Gareth Moreton via fpc-devel
> <fpc-devel at lists.freepascal.org> wrote:
>> Thanks Tomas,
>>
>> Nothing is broken, but the timing measurement isn't precise enough.
>>
>> Normally I have a much higher iteration count (e.g. 1,000,000), but I
>> had reduced it to 10,000 because, coupled with the 1,000 iterations in
>> the subroutines themselves, would have led to 1,000,000,000 passes and
>> hence would take in the region of five to ten minutes to complete for a
>> 16 MHz 386, for example.  Rika's suggestion of running as many
>> iterations as needed until, say, 5 seconds elapses, would help but the
>> timing measurements would cause a lot of latency and will be imprecise
>> on very slow routines.  Still, let's see if 100,000 gives better results
>> for you.
>>
>> Kit
> Results on a modest CPU:
>
> CPU =       Intel(R) Celeron(R) CPU  N3050  @ 1.60GHz
> -----------------------------------------------------
>     Pascal control case: 6.71 ns/call
>   Using LEA instruction: 2.09 ns/call
> Using ADD instructions: 2.05 ns/call
>
> 32 bits:
>     Pascal control case: 6.78 ns/call
>   Using LEA instruction: 2.16 ns/call
> Using ADD instructions: 2.09 ns/call
>
> Results show a bit of variance, above numbers are more or less typical.
>
> Christo
>