<HTML>
<style> BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px; }</style>At the moment, I'm experimenting with overhauling the x86_64 optimizer to see if I can reduce the number of passes through a block of code - my hope is to greatly increase the speed of the compiler without sacrificing the optimisations performed under -O1 and -O2. At present, I've attempted to not modify i386 because I wish to use it as a control case (i.e. do my changes break other platforms?)<br>
<br>
<div>It's probably not worthy of the bounty, but I'm enjoying the challenge to seeing if I can improve the overall speed in places.</div><div><br>
</div><div>Gareth aka. Kit<br>
</div> <br>
<br>
<span style="font-weight: bold;">On Fri 16/11/18 22:58 , "Florian Klämpfl" florian@freepascal.org sent:<br>
</span><blockquote style="BORDER-LEFT: #F5F5F5 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px; PADDING-LEFT: 5px; PADDING-RIGHT: 0px">Am 16.11.2018 um 23:41 schrieb Florian Klämpfl:
<br>
<span style="color: rgb(102, 102, 102);">> Am 16.11.2018 um 23:36 schrieb Jonas Maebe:
</span><br>
<span style="color: rgb(102, 102, 102);">>> On 16/11/18 22:44, Florian Klämpfl wrote:
</span><br>
<span style="color: rgb(102, 102, 102);">>>> With some compiler tuning and a few tricks (two changes to the code and hand-simulated peephole optimizations, but I
</span><br>
<span style="color: rgb(102, 102, 102);">>>> think these tricks can also the compiler do):
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> You can improve performance further by devirtualising all method calls using wpo. First compile it with -FWvipri.wpo
</span><br>
<span style="color: rgb(102, 102, 102);">>> -OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at least on my machine it gives a small boost,
</span><br>
<span style="color: rgb(102, 102, 102);">>> and makes the results also more stable).
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> Since I only have a preliminary llvm version (with Dwarf EH) running on macOS, I can't provide a direct Kylix
</span><br>
<span style="color: rgb(102, 102, 102);">>> comparison. The versions below are both x86-64. As mentioned before, a 32 bit FPC/LLVM is still quite a way off.
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> * FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS:
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> $ time ./vipribenchmemcache_nodeps
</span><br>
<span style="color: rgb(102, 102, 102);">>> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, NumberOfChannels=6, BufferPackets=5000,
</span><br>
<span style="color: rgb(102, 102, 102);">>> NumberOfSynchroThreads=4
</span><br>
<span style="color: rgb(102, 102, 102);">>> .................................................................................................
</span><br>
<span style="color: rgb(102, 102, 102);">>> Time: 5016ms = 9669059 pkts/s = 14680 MB/s
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> real 0m5.137s
</span><br>
<span style="color: rgb(102, 102, 102);">>> user 0m5.042s
</span><br>
<span style="color: rgb(102, 102, 102);">>> sys 0m0.017s
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no
</span><br>
<span style="color: rgb(102, 102, 102);">>> LLVM link-time optimization):
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> $ time ./vipribenchmemcache_nodeps_llvm
</span><br>
<span style="color: rgb(102, 102, 102);">>> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, NumberOfChannels=6, BufferPackets=5000,
</span><br>
<span style="color: rgb(102, 102, 102);">>> NumberOfSynchroThreads=4
</span><br>
<span style="color: rgb(102, 102, 102);">>> .................................................................................................................
</span><br>
<span style="color: rgb(102, 102, 102);">>> Time: 5018ms = 11259466 pkts/s = 17094 MB/s
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>> real 0m5.161s
</span><br>
<span style="color: rgb(102, 102, 102);">>> user 0m5.060s
</span><br>
<span style="color: rgb(102, 102, 102);">>> sys 0m0.017s
</span><br>
<span style="color: rgb(102, 102, 102);">>>
</span><br>
<span style="color: rgb(102, 102, 102);">>
</span><br>
<span style="color: rgb(102, 102, 102);">> Can you test with FPC 3.1.1 native, -O4 and the following patch:
</span><br>
<span style="color: rgb(102, 102, 102);">>
</span><br>
<span style="color: rgb(102, 102, 102);">> compiler/nmem.pas | 2 +-
</span><br>
<span style="color: rgb(102, 102, 102);">> 1 file changed, 1 insertion(+), 1 deletion(-)
</span><br>
<span style="color: rgb(102, 102, 102);">>
</span><br>
<span style="color: rgb(102, 102, 102);">> diff --git a/compiler/nmem.pas b/compiler/nmem.pas
</span><br>
<span style="color: rgb(102, 102, 102);">> index d5c1d85e8f..52add1fd81 100644
</span><br>
<span style="color: rgb(102, 102, 102);">> --- a/compiler/nmem.pas
</span><br>
<span style="color: rgb(102, 102, 102);">> +++ b/compiler/nmem.pas
</span><br>
<span style="color: rgb(102, 102, 102);">> @@ -1176,7 +1176,7 @@ implementation
</span><br>
<span style="color: rgb(102, 102, 102);">> begin
</span><br>
<span style="color: rgb(102, 102, 102);">> include(flags,nf_write);
</span><br>
<span style="color: rgb(102, 102, 102);">> { see comment in tsubscriptnode.mark_write }
</span><br>
<span style="color: rgb(102, 102, 102);">> - if not(is_implicit_pointer_object_type(left.resultdef)) then
</span><br>
<span style="color: rgb(102, 102, 102);">> + if not(is_implicit_array_pointer(left.resultdef)) then
</span><br>
<span style="color: rgb(102, 102, 102);">> left.mark_write;
</span><br>
<span style="color: rgb(102, 102, 102);">> end;
</span><br>
<span style="color: rgb(102, 102, 102);">>
</span><br>
<span style="color: rgb(102, 102, 102);">> ?
</span><br>
<br>
Hmmm, needs a few more of my changes to make work, though it should work if used only with the benchmark.
<br>
<br>
_______________________________________________
<br>
fpc-devel maillist - <a href="javascript:top.opencompose('fpc-devel@lists.freepascal.org','','','')">fpc-devel@lists.freepascal.org</a>
<br>
<a target="_blank" href="parse.php?redirect=<a href="http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</a>"><span style="color: red;">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</span></a>
<br>
<br>
<br>
</blockquote></HTML>