[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

Martin Frb lazarus at mfriebe.de
Tue Jan 4 20:33:55 CET 2022

On 04/01/2022 18:43, Jonas Maebe via fpc-devel wrote:
> On 03/01/2022 12:54, Martin Frb via fpc-devel wrote:
>> not sure if this is of interest to you, but I see you do a lot on the 
>> optimizer....
> It's very likely unrelated to anything the optimiser does or does not 
> do, and more regarding random differences in code layout. Charlie 
> posted the following video on core just yesterday, and it touches on 
> exactly this subject: https://www.youtube.com/watch?v=r-TLSBdHe1A
> Choice quote: code layout and environment variables can produce up to 
> 40% differences in performance, which is more than what even the best 
> optimizing compilers can achieve do in most cases.


And yes, see my previous post. It seems to be which "sub-section" of a 
loop falls into a 32 byte aligned 32 byte block.
It's not even the entire loop (not about the begin of the loop), but a 
certain code block within.

This also goes along with one optimization that (even though still 
chance) in my test improved the timing (both worst and best time, though 
those are only *my" worst/best)
=> reducing the byte size of the loop code.
That way there are less 32byte sections.

More information about the fpc-devel mailing list