[fpc-devel] Curious about the effect of all the new optimizations....

J. Gareth Moreton gareth at moreton-family.com
Wed Mar 1 12:25:31 CET 2023


My peephole optimisations mostly save only a handful of cycles each time 
which probably won't add up to much for a relatively short test.  The 
most major optimisation I can think of, although I'm not quite sure when 
it was merged, is the method of replacing divisions by a constant with 
an equivalent reciprocal multiplication.  You'll see the biggest savings 
there.  There's other difficulties like processors being intelligent 
with caching and out of order execution, for example, that are 
disguising some inefficiencies.  And some seek only to reduce code size 
with no loss of speed.

What are your timings like when compiling with COREAVX or COREAVX2?  A 
couple of recent peephole optimizations make use of BMI1 and BMI2.

I can't remember the proverb that Florian used, but it essentially boils 
down to very small changes, individually not amounting to much, but 
which accumulate into a noticable difference when in large numbers.

Kit

On 01/03/2023 10:32, Martin Frb via fpc-devel wrote:
> So for a while now fpc 3.3.1 receives new optimizations => which is 
> great / big fan of it.
>
> And hence I thought, lets see how much of an impact they have. And in 
> my test, they had none :(
> Wondering if any one else has measured them?
>
> My tests:
> Win-10 64 bit
> 3.3.1  905c485ff413cd48f98891e2075c814759d0c6f1
> 3.2.3  2022-02-04
> both compilers with each O2 and O4
>
> Using the testcase for FpDebug (which runs a decent spread of code).
> Testcase with O2 and O3
>
> And I got no noticeable difference.
> I also tried {$CodeAlign proc=32 loop=32} for O2 (test and fpc), also 
> no diff.
>
>
> O2 / fpc: o2 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.406
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.063
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.609
> O2 / fpc: o2 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.251
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.031
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  21.531
>
>
> O3 / fpc: o2 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.687
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.281
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.469
> O3 / fpc: o2 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  23.203
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.250
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.140
>
>
> O3 / fpc: o4 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  23.063
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.250
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.875
> O3 / fpc: o4 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.577
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.094
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.235
>
>
> {$CodeAlign proc=32 loop=32}
> O2 / fpc: def 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.453
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.328
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.656
> O2 / fpc: def 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.079
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  22.234
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug       :  21.984
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>


More information about the fpc-devel mailing list