[fpc-devel] inline... and philosophy

Fri Nov 22 10:04:55 CET 2019

Does that mean in some situations, if you have a small, tight loop, it 
might be better to optimise over speed in some very rare cases? For 
example, turning MOV EAX, $FFFFFFFF into OR EAX, $FF to squeeze out a 
few extra bytes, even though the instruction introduces a false dependency.

Gareth aka. Kit

On 22/11/2019 08:29, Marģers . via fpc-devel wrote:
>> Op 10/11/2019 om 11:17 schreef Marģers . via fpc-devel
>>>   Most processors have a fairly large uop cache (up to 2048 for the newest
>>>> generations iirc), so this would only be for the first iteration? Do you
>>>> have a reference (agner fog page or so) or more explanation for this
>>>> that describes this?)
>>> I have to revoke my statement. Don't have evidence to back up. Code, that lead me to thous conclusions, has been discarded.
>>> I have read most whats published in agner's fog page. There nothing to pinpoint as reference.
>> No prob. Was just interested, I had to do some sse/avx code the last
>> years, and hadn't heard of this.
> I did some research
>
> manual from Agner's Fog page
> The microarchitecture of Intel, AMD and VIA CPUs
>
> 20.17 Cache and memory access
> Level 1 code      64 kB, 4 way, 256 sets, 64 B line size, per core. Latency 4 clocks
>
> As well i created some performance tests and found out that if loop crossed 64 B line it got 20% performance lose while measurement error was 2%.
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>