[fpc-devel] Curious about the effect of all the new optimizations....
J. Gareth Moreton
gareth at moreton-family.com
Wed Mar 1 12:25:31 CET 2023
My peephole optimisations mostly save only a handful of cycles each time
which probably won't add up to much for a relatively short test. The
most major optimisation I can think of, although I'm not quite sure when
it was merged, is the method of replacing divisions by a constant with
an equivalent reciprocal multiplication. You'll see the biggest savings
there. There's other difficulties like processors being intelligent
with caching and out of order execution, for example, that are
disguising some inefficiencies. And some seek only to reduce code size
with no loss of speed.
What are your timings like when compiling with COREAVX or COREAVX2? A
couple of recent peephole optimizations make use of BMI1 and BMI2.
I can't remember the proverb that Florian used, but it essentially boils
down to very small changes, individually not amounting to much, but
which accumulate into a noticable difference when in large numbers.
Kit
On 01/03/2023 10:32, Martin Frb via fpc-devel wrote:
> So for a while now fpc 3.3.1 receives new optimizations => which is
> great / big fan of it.
>
> And hence I thought, lets see how much of an impact they have. And in
> my test, they had none :(
> Wondering if any one else has measured them?
>
> My tests:
> Win-10 64 bit
> 3.3.1 905c485ff413cd48f98891e2075c814759d0c6f1
> 3.2.3 2022-02-04
> both compilers with each O2 and O4
>
> Using the testcase for FpDebug (which runs a decent spread of code).
> Testcase with O2 and O3
>
> And I got no noticeable difference.
> I also tried {$CodeAlign proc=32 loop=32} for O2 (test and fpc), also
> no diff.
>
>
> O2 / fpc: o2 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.406
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.063
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.609
> O2 / fpc: o2 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.251
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.031
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 21.531
>
>
> O3 / fpc: o2 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.687
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.281
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.469
> O3 / fpc: o2 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 23.203
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.250
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.140
>
>
> O3 / fpc: o4 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 23.063
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.250
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.875
> O3 / fpc: o4 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.577
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.094
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.235
>
>
> {$CodeAlign proc=32 loop=32}
> O2 / fpc: def 323
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.453
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.328
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.656
> O2 / fpc: def 331
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.079
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 22.234
> TestWatchesValue_fpc 264_Dwarf_32Bit_FpDebug : 21.984
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
More information about the fpc-devel
mailing list