[fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM
pascaldragon at googlemail.com
Fri Nov 23 14:12:38 CET 2018
Am Fr., 23. Nov. 2018, 12:15 hat Adriaan van Os <fpc at microbizz.nl>
> Simon Kissel wrote:
> > We know about a couple of bottlenecks (fpc_pushexceptaddr /
> > RelocateThreadVar etc) which explain FPC's terrible multi-threading
> > performance, but in general, FPC's code generator really is quite
> > a mess, which we learned the hard way a couple of years when we
> > did optimization work on the ARM target.
> I find the phrase. "FPC's terrible multi-threading performance" unjust.
> When I do multi-threading
> with FPC, I get a near N speed improvement (on i386 and x86_64) where N is
> the number of cores,
> including hyper-threaded cores ....
> What about taking another way, having a precise look at the source code ?
> Did you profile it ? What
> sort of work does the code do ? How are the threads synchronized ? What
> data structures are used ?
> I don't take "the compiler is so bad" without an answer to these questions.
Simon wrote that the same code performs better when compiled with Kylix, so
there definitely are things that can be done better by FPC and as Florian's
work on TLS variables showed indeed *do* make FPC perform better. I suspect
a similar improvement with DWARF exceptions as the setjmp/longjmp based
approach *is* more expensive for the case when no exception occures
compared to the case of marking protected code in the meta data as DWARF
and SEH64 do.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the fpc-devel