[fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM
Florian Klämpfl
florian at freepascal.org
Fri Nov 16 23:41:47 CET 2018
Am 16.11.2018 um 23:36 schrieb Jonas Maebe:
> On 16/11/18 22:44, Florian Klämpfl wrote:
>> With some compiler tuning and a few tricks (two changes to the code and hand-simulated peephole optimizations, but I
>> think these tricks can also the compiler do):
>
> You can improve performance further by devirtualising all method calls using wpo. First compile it with -FWvipri.wpo
> -OWDEVIRTCALLS,OPTVMTS and next with -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (at least on my machine it gives a small boost,
> and makes the results also more stable).
>
> Since I only have a preliminary llvm version (with Dwarf EH) running on macOS, I can't provide a direct Kylix
> comparison. The versions below are both x86-64. As mentioned before, a 32 bit FPC/LLVM is still quite a way off.
>
> * FPC 3.0.4 -MDelphi -O2 -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS:
>
> $ time ./vipribenchmemcache_nodeps
> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, NumberOfChannels=6, BufferPackets=5000,
> NumberOfSynchroThreads=4
> .................................................................................................
> Time: 5016ms = 9669059 pkts/s = 14680 MB/s
>
> real 0m5.137s
> user 0m5.042s
> sys 0m0.017s
>
> FPC 3.3.1 + llvm (clang from Xcode 10.1 with -O3 on FPC-generated llvm IR) and -Fwvipri.wpo -OwDEVIRTCALLS,OPTVMTS (no
> LLVM link-time optimization):
>
> $ time ./vipribenchmemcache_nodeps_llvm
> VipriBenchThreaded - RunningTimeSeconds=5, TestCount=100, StartSeq=0, NumberOfChannels=6, BufferPackets=5000,
> NumberOfSynchroThreads=4
> .................................................................................................................
> Time: 5018ms = 11259466 pkts/s = 17094 MB/s
>
> real 0m5.161s
> user 0m5.060s
> sys 0m0.017s
>
Can you test with FPC 3.1.1 native, -O4 and the following patch:
compiler/nmem.pas | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/compiler/nmem.pas b/compiler/nmem.pas
index d5c1d85e8f..52add1fd81 100644
--- a/compiler/nmem.pas
+++ b/compiler/nmem.pas
@@ -1176,7 +1176,7 @@ implementation
begin
include(flags,nf_write);
{ see comment in tsubscriptnode.mark_write }
- if not(is_implicit_pointer_object_type(left.resultdef)) then
+ if not(is_implicit_array_pointer(left.resultdef)) then
left.mark_write;
end;
?
More information about the fpc-devel
mailing list