[fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

Sun Nov 18 11:08:05 CET 2018

Am 17.11.2018 um 22:28 schrieb Florian Klämpfl:
> Am 17.11.2018 um 22:10 schrieb Simon Kissel:
>> Hi Florian,
>>
>>> With some compiler tuning and a few tricks (two changes to the code
>>> and hand-simulated peephole optimizations, but I
>>> think these tricks can also the compiler do):
>>
>> Nice - what changes did you do?
>>
>> Changing the code of course is cheating, but there might be something
>> to learn for us, here.
> 
> I prevented the compiler to put certain variables in registers by taking their address :) But I did so only to test if
> this helps and for i386 this helps as the decision which variables go into registers is not that easy, but see below.
> 
>>
>> Would be great if whatever trick you did could be part of the
>> compiler.
> 
> Meanwhile the compiler can do it (not yet committed). Same VM as yesterday, all rates are a little bit lower, not sure
> why (probably to many VMs open :)), but this applies to all three executables.
> 
> florian at ubuntu32:~$ ./vipribenchmemcache_nodeps

With rev. 40346 I have committed my last changes. As the code is still experimental, it needs to be activated by the
command line when building FPC:

make clean all "OPT=-Aas -dtls_threadvars -O4 -dSPILLING_NEW"

(add -Cp... -Op... options if the target system is known)

Compile the benchmark with (where fpcnew is the newly build fpc):

fpcnew -O4 -Sd -FWvipri.wpo -OWDEVIRTCALLS,OPTVMTS vipribenchmemcache_nodeps.dpr
fpcnew -O4 -Sd -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS vipribenchmemcache_nodeps.dpr

The changes help also on arm and arm can be build using the same command line, however, at least on a Raspi3B+ the
improvement is less significant than on i386 (still the old cache flush (?) issue which is outside of the scope of FPC?).