[fpc-devel] LEA instruction speed
J. Gareth Moreton
gareth at moreton-family.com
Wed Oct 11 07:10:25 CEST 2023
The LEA and ADD times are close enough that I can consider them
identical. And Braswell (the architecture behind that brand of Celeron)
doesn't support AVX, I don't think, so that lines up with COREI having a
fast LEA instruction but not COREAVX.
Given the many different x86-compatible CPUs, I wonder if we need to
document the best compiler parameters for end users in some way (e.g. so
it can be coded in a device driver installer so the most optimised
binary can be installed for a given CPU architecture).
Kit
On 11/10/2023 05:56, Christo Crause wrote:
> On Tue, Oct 10, 2023 at 11:13 AM J. Gareth Moreton via fpc-devel
> <fpc-devel at lists.freepascal.org> wrote:
>> Thanks Tomas,
>>
>> Nothing is broken, but the timing measurement isn't precise enough.
>>
>> Normally I have a much higher iteration count (e.g. 1,000,000), but I
>> had reduced it to 10,000 because, coupled with the 1,000 iterations in
>> the subroutines themselves, would have led to 1,000,000,000 passes and
>> hence would take in the region of five to ten minutes to complete for a
>> 16 MHz 386, for example. Rika's suggestion of running as many
>> iterations as needed until, say, 5 seconds elapses, would help but the
>> timing measurements would cause a lot of latency and will be imprecise
>> on very slow routines. Still, let's see if 100,000 gives better results
>> for you.
>>
>> Kit
> Results on a modest CPU:
>
> CPU = Intel(R) Celeron(R) CPU N3050 @ 1.60GHz
> -----------------------------------------------------
> Pascal control case: 6.71 ns/call
> Using LEA instruction: 2.09 ns/call
> Using ADD instructions: 2.05 ns/call
>
> 32 bits:
> Pascal control case: 6.78 ns/call
> Using LEA instruction: 2.16 ns/call
> Using ADD instructions: 2.09 ns/call
>
> Results show a bit of variance, above numbers are more or less typical.
>
> Christo
>
More information about the fpc-devel
mailing list