[fpc-devel] Class field reordering
tom_at_work at gmx.at
Tue Jul 17 13:59:17 CEST 2012
On Tue, 2012-07-17 at 08:22 +0200, Skybuck Flying wrote:
> > I also wonder how much of an optimization it actually is ? Maybe 0.000001%
> > more performance ?
> 1) as mentioned in the original mail, the current transformation is
> implemented for saving memory, not for improving performance
> This wasn't clear, it only mentions gaps. What kind of gaps ?
Gaps between fields of an object instance due to alignment. Typically
loading data from unaligned address, i.e. an address that is not evenly
divisible by the field size, is slower than otherwise. Also, some CPU
architectures even give you an exception if you actually try.
E.g. for the following object
test = class
b1 : byte;
d : double;
b2 : byte;
q : qword;
due to above hardware limitations an instance will look as follows in
memory (first column indicates offset, disregarding any additional
0 : b1
1-7 : <empty>
8-15 : d
16 : b2
So for storing 18 bytes of usable data, you use up 32 bytes in memory,
i.e. a waste of 44%.
Now imagine your program uses thousands of these records.
> Apperently you
> ment to minimize memory size, the opposite could also have been ment in the
> sense of optimizations to make fields fall on memory boundaries for perhaps
> increased fetch speed or something else.
You always want to make fields fall on memory boundaries (i.e. align
them) except if you are either really scarce on memory, need a specific
layout for i/o purposes or have another reason to do so (e.g. extra
padding in multi-threaded code when you want to avoid cache line
But then hopefully you know what you need to take care of and read the
manual. There is a way to disable this reordering on a per-class basis.
> Later performance optimization possibilities for the future are mentioned as
Given that software on reasonably modern hardware is very often memory
bound, a decrease in memory footprint often translates into real
In the above case, if this "optimization" is applied, the object
instance looks as follows (for example, do not know the exact
0-7 : d
8-15 : q
16 : b1
17 : b2
I.e. the object now uses exactly 18 bytes of memory.
> 2) if it was done for performance reasons, some people already got up to 34%
> extra performance by doing exactly that:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.34.4009 (download
> the cached version of the paper, the original link no longer works)
> I scanned over this document and it seems to mention "profiling
> information", it also mentions "compilers which can then use this profiling
> information to re-arrange fields".
> To me it seems this optimization idea belongs in the realm of "profilers".
> It should be easy for a programmer to use such a profiling tool and make the
> necessary changes him/herself instead of complexifieing the compiler.
> [.. snipped the rest...]
Imo for a static compiler like fpc, re-arranging fields for pure
performance reasons without reasonably accurate profiling information is
indeed pure guesswork.
I do not see a problem with automating this process.
However as mentioned above, decreasing memory footprint often also
> > I rarely inspect the binary equivalent of a class instance, so your
> > supposedly optimization is probably not a big deal, for records that would
> > be a different matter since these are used in all kinds of api's and
> > input/output situations.
> As mentioned in the original mail, the transformation would only be applied
> to classes.
> How is a record not an abstraction and a class is an abstraction, that's
> kinda weird/inconsistent ?!
Because unfortunately people are already (mis-)using records by
blockwriting/reading them without knowing that this is not portable at
Actually such things already break e.g. when moving from 32 to 64 bit
processors, or from one cpu architecture to another with different
alignment rules, and so on. When 64 bit was new, there have been many
questions/issues about exactly that - on this list too.
It looks like a purely pragmatic decision. (And maybe sometime somewhere
the Borland people defined it that way for records).
> >>> It's already bad that Delphi adds invisible fields to classes so they
> >>> cannot be simply dumped to disk... (virtual method table pointers ?)
> >>> this would make it even worse.
> >> If you want to program at an assembler level of abstraction, don't use
> >> high level language features.
> > I see no reason why a high level language could not be used to produce
> > binary instructions and or files/data.
> It can be used for that, as long as you don't use high level abstractions.
> The whole point of abstractions to get rid of any guarantees a far as
> implementation is concerned, in order to increase portability, programmer
> productivity and compiler optimization opportunities.
> How about instead extending the pascal language description and specifieing
> that the order of the fields in the class and records must be the same in
> binary as well. This seems nice and constant and might allow some other
> functionalities in the future.
> That is not to say that this supposedly optimization could be done and
> later then removed if this order extension is introduced.
Given that not fixing the order has above mentioned tangible advantages
right now, and fixing the layout only some unknown ones in the future,
it seems the better choice to not do it. Also because it can still be
specified in the future if needed :)
So we agree.
More information about the fpc-devel