[fpc-devel] The 15k bounty: Optimizing executable speed for Linux x86 / LLVM

Simon Kissel simon.kissel at nerdherrschaft.com
Fri Nov 23 21:07:53 CET 2018


Hi Adriaan,

In case you aren't just trolling and the subject really is of
interest to you, I would recommend reading the discussion
thread in full. That works much better than treating this
like a write-only system.

> You didn't answer any of my questions. The goal is to get the
> code faster, isn't it.

No, the goal is not to get any specific code faster. The goal
is to have the compiler and/or RTL improved so that all code
compiled benefits, and that execution speed in general gets on
par with the 15 years old Kylix/Delphi 7 compilers.

And yes, of course we are profiling our code for years, and we
know what we are doing and talking about. Our code sadly does
not have any bottlenecks in the sense of a small number of
functions eating most of the CPU, the load is pretty evenly
distributed across all of the functions. This means that the
problem is distributed all across the code. However, there
is something sticking out, being at the very top of pretty
much all multi-threaded code we compile:

fpc_pushexceptaddr & CRelocateThreadVar.

Besides this, not everything can be uncovered by profiling,
and that part is nothing that FPC can change: On one of
the ARM platforms we use every context switch results in a
CPU cache flush, so simply by having more threads *all* of
them will become slower.

The benchmark code as our real-life code is able to utilize
~99% of the CPU, so no, it's also not a matter of thread
synchronization (we aren't spinlocking).

The commercial reason behind putting out a 15k bounty is that
no matter how much more money I invest into optimizing my
own code, it won't get much better than what it is today,
and that Kylix producing faster code does not compensate it
not supporting any of the nice-to-have language features that
FPC has today.

Simon







More information about the fpc-devel mailing list