[fpc-devel] Re: Comparison FPC 2.6.2 - Kylix 3

Mon Mar 4 12:05:37 CET 2013

Am 04.03.2013 01:00, schrieb Graeme Geldenhuys:
> 
> 4.4 seconds (Kylix under Linux) vs 89 seconds (FPC under Linux)... That
> is just too a huge performance difference to justify. Yes, we all know
> the argument about more platforms, maintainable code etc, but that
> couldn't possible be the only reason for such a huge speed difference.
> Somewhere there is a serious bottleneck(s), or the FPC team simply
> disregard optimization completely. From why I have heard them say, the
> latter is more likely [unfortunately].

You completely miss the point. If there are only approx 25
features/properties which make the compiler each 10% slower than in
total FPC is 10 (1.1^25=10.9) times slower than before. Let me name some
of the stuff which might count only 10% each (some more, some less) but
in total makes FPC much slower:

- gcc compatible output format (*.ppu+*.o instead of one big ppu only
usable with FPC)
- flexible assembler output (different assemblers supported, assembler
listing output)
- flexible object file output (coff, elf, ...)
- portable ppus (regardless of the host system, ppus are bitwise equal.
this means e.g. a lot of endian conversion checks and calls)
- class helpers
- operator overloading
- generics
- architecture agnostic node tree
- architecture agnostic constant handling (96 bit arithmetics)
- portable code generator
- support of bit packed data structures
- flexible debugging info output
- completely written in a high level language
- architecture agnostic symtable handling
- using the rtl heap manager, a non threadsafe heap manager would be
probably slightly faster but do we want to maintain two full featured
heap managers?
- using ansistrings, one could switch to zero passed char arrays but do
we really want this?
- architecture agnostic handling of procedure parameter passing
- code page aware reading of input files
- different FPU types supported
- portable optimizer
- 32/64 bit assembler supporting all available instruction sets
- high level code generator layer for jvm support
- support of different pascal dialects
- portable and flexible file handling
- using classes instead of objects
- ...

And yes, the speed penalty of these features/properties often multiplies
and is not only additive. Of course, this is all simplified but it
should give you an idea where the factor 10 comes from. It is a lot of
small things none of them really counting but in total it's a lot
(exponential grow) and this makes it impossible to fix this without
sacrifying a lot of FPC's power.

FYI: FPC 1.x is several times faster than FPC 2.x
FYI2: Last time somebody tried (years ago, -/+ year 2000), FPC compiled
by Delphi was slower than FPC compiled by FPC.

My goal regarding the speed of FPC is: let Moore win. This means faster
CPUs should make FPC faster than new features in FPC make FPC slower and
this works for 20 years now. So nothing to worry.

>> You are only showing the Delphi/Kylix speed is
>> extremely superior
> 
> And Martin is just showing half the problem. The Delphi & Kylix
> compilers also produce executables that run 10+ times faster than what
> FPC 2.6.0 can produce. Even on the more optimized 32-bit compiler. And
> don't even think of mentioning that faster hardware will mask the
> problem - it doesn't. I have a i7-2660K running at 3.6Ghz with high
> performance RAM and 450MB read speed SSD. I noticed a > 10+ times
> difference in running executables on my hardware.

Then something with your code is wrong. Or you hit some strange
bottleneck, you should really profile your code.