[fpc-devel] The 15k bounty: Optimizing executable speed forLinux x86 / LLVM

Sat Nov 17 03:52:28 CET 2018

 At the moment, I'm experimenting with overhauling the x86_64 optimizer to
see if I can reduce the number of passes through a block of code - my hope
is to greatly increase the speed of the compiler without sacrificing the
optimisations performed under -O1 and -O2.  At present, I've attempted to
not modify i386 because I wish to use it as a control case (i.e. do my
changes break other platforms?)

 It's probably not worthy of the bounty, but I'm enjoying the challenge to
seeing if I can improve the overall speed in places.
 Gareth aka. Kit

 On Fri 16/11/18 22:58 , "Florian Klämpfl" florian at freepascal.org sent:
 Am 16.11.2018 um 23:41 schrieb Florian Klämpfl: 
 > Am 16.11.2018 um 23:36 schrieb Jonas Maebe: 
 >> On 16/11/18 22:44, Florian Klämpfl wrote: 
 >>> With some compiler tuning and a few tricks (two changes to the code
and hand-simulated peephole optimizations, but I 
 >>> think these tricks can also the compiler do): 
 >> 
 >> You can improve performance further by devirtualising all method calls
using wpo. First compile it with -FWvipri.wpo 
 >> -OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo
-OwDEVIRTCALLS,OPTVMTS (at least on my machine it gives a small boost, 
 >> and makes the results also more stable). 
 >> 
 >> Since I only have a preliminary llvm version (with Dwarf EH) running on
macOS, I can't provide a direct Kylix 
 >> comparison. The versions below are both x86-64. As mentioned before, a
32 bit FPC/LLVM is still quite a way off. 
 >> 
 >> * FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS: 
 >> 
 >> $ time ./vipribenchmemcache_nodeps 
 >> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0,
NumberOfChannels=6, BufferPackets=5000, 
 >> NumberOfSynchroThreads=4 
 >>
.................................................................................................

 >> Time: 5016ms = 9669059 pkts/s = 14680 MB/s 
 >> 
 >> real    0m5.137s 
 >> user    0m5.042s 
 >> sys    0m0.017s 
 >> 
 >> FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm
IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no 
 >> LLVM link-time optimization): 
 >> 
 >> $ time ./vipribenchmemcache_nodeps_llvm 
 >> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0,
NumberOfChannels=6, BufferPackets=5000, 
 >> NumberOfSynchroThreads=4 
 >>
.................................................................................................................

 >> Time: 5018ms = 11259466 pkts/s = 17094 MB/s 
 >> 
 >> real    0m5.161s 
 >> user    0m5.060s 
 >> sys    0m0.017s 
 >> 
 > 
 > Can you test with FPC 3.1.1 native, -O4 and the following patch: 
 > 
 > compiler/nmem.pas | 2 +- 
 > 1 file changed, 1 insertion(+), 1 deletion(-) 
 > 
 > diff --git a/compiler/nmem.pas b/compiler/nmem.pas 
 > index d5c1d85e8f..52add1fd81 100644 
 > --- a/compiler/nmem.pas 
 > +++ b/compiler/nmem.pas 
 > @@ -1176,7 +1176,7 @@ implementation 
 > begin 
 > include(flags,nf_write); 
 > { see comment in tsubscriptnode.mark_write } 
 > - if not(is_implicit_pointer_object_type(left.resultdef)) then 
 > + if not(is_implicit_array_pointer(left.resultdef)) then 
 > left.mark_write; 
 > end; 
 > 
 > ? 

 Hmmm, needs a few more of my changes to make work, though it should work
if used only with the benchmark. 

 _______________________________________________ 
 fpc-devel maillist - fpc-devel at lists.freepascal.org 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[1]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

Links:
------
[1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20181117/e7e773b9/attachment.html>