[fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM
Jonas Maebe
jonas at freepascal.org
Fri Nov 16 23:36:10 CET 2018
On 16/11/18 22:44, Florian Klämpfl wrote:
> With some compiler tuning and a few tricks (two changes to the code and hand-simulated peephole optimizations, but I
> think these tricks can also the compiler do):
You can improve performance further by devirtualising all method calls
using wpo. First compile it with -FWvipri.wpo -OWDEVIRTCALLS,OPTVMTS and
next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at least on my machine it
gives a small boost, and makes the results also more stable).
Since I only have a preliminary llvm version (with Dwarf EH) running on
macOS, I can't provide a direct Kylix comparison. The versions below are
both x86-64. As mentioned before, a 32 bit FPC/LLVM is still quite a way
off.
* FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS:
$ time ./vipribenchmemcache_nodeps
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0,
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4
.................................................................................................
Time: 5016ms = 9669059 pkts/s = 14680 MB/s
real 0m5.137s
user 0m5.042s
sys 0m0.017s
FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm
IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no LLVM link-time
optimization):
$ time ./vipribenchmemcache_nodeps_llvm
VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0,
NumberOfChannels=6, BufferPackets=5000, NumberOfSynchroThreads=4
.................................................................................................................
Time: 5018ms = 11259466 pkts/s = 17094 MB/s
real 0m5.161s
user 0m5.060s
sys 0m0.017s
Jonas
More information about the fpc-devel
mailing list