[fpc-devel] Re: Comparison FPC 2.6.2 - Kylix 3

Mon Mar 4 13:38:50 CET 2013

Op Mon, 4 Mar 2013, schreef Martin Schreiber:

> On Monday 04 March 2013 12:05:37 Florian Klämpfl wrote:
>> Am 04.03.2013 01:00, schrieb Graeme Geldenhuys:
>>> 4.4 seconds (Kylix under Linux) vs 89 seconds (FPC under Linux)... That
>>> is just too a huge performance difference to justify. Yes, we all know
>>> the argument about more platforms, maintainable code etc, but that
>>> couldn't possible be the only reason for such a huge speed difference.
>>> Somewhere there is a serious bottleneck(s), or the FPC team simply
>>> disregard optimization completely. From why I have heard them say, the
>>> latter is more likely [unfortunately].
>>
>> You completely miss the point. If there are only approx 25
>> features/properties which make the compiler each 10% slower than in
>> total FPC is 10 (1.1^25=10.9) times slower than before.
>
> Is this correct? It implies that every feature/property uses 100% of the total
> process. And if it is true it is absolutely necessary to stop adding features
> soon because 1.1^50 = 117.4. ;-)

Some features only request procesing power if you use them. However, 
the features in Florian's list require continuous processing power. Two 
examples how features can impact overall speed:

1. Operator overloading

Operators are some of the most common tokens in source code. Without 
operator overloading, if you parse an operator, you simply generate a tree 
node.

With operator overloading, for each operator that you parse, you have to 
traverse all loaded units to check if the operator is overloaded. If there 
are 50 units loaded, this means 50 symtable lookups, simply because the 
operator might be overloaded.

For each operator overload candidate that is found, the compiler has
need to check for many possible type conversions to see if the candidate 
can actually be used.

The situation with Pascal type conversion has grown increasingly complex 
over the years. For example almost any type can be converted into a 
variant, and a variant can be converted into almost any type. This 
requires all kinds of special handling, not only to do the right thing, 
but also not to do ineffcient type conversions.

So even if you don't use operator overloading or variants at all, they do 
affect the compiler speed.

2. Layered code generation

The split of the code generation in a high-level and low-level layer, 
means that for every node that is processed, first the high-level virtual 
method is called, which in turn calls the lower level virtual method. Thus 
you have an addition virtual method call for evey node processed.

The low level code generator, which is still mostly CPU independent, again 
calls virtual methods from the abstract assembler layer to generate the 
actual opcodes.

The abstract assembler in turn, has again to worry about multiple 
assemblers which can emit the final object file.

Now each layer not just has its own code, but also its own type and 
therefore conversion functions need to be called (for example a def has a 
size), which is converted into a cgsize and ultimately into an opsize.

Obviously, if you just had one layer, and could output instruction 
directly to the object file, you can save a lot of performance.

While you might develop for just one platform, the fact that many of them 
are supported, costs compiler performance.

Daniël