[fpc-devel] inline... and philosophy

Marco van de Voort core at pascalprogramming.org
Mon Nov 11 10:43:50 CET 2019


Op 10/11/2019 om 16:02 schreef J. Gareth Moreton:
> This message chain has proven to be a lot more educational and 
> insightful than I would have given it credit for.  Thanks everybody!
>
> I know a lot of the time, the size of binaries is just an illusion, 
> along with unfair comparisons with GCC (a behemoth with corporate 
> support) and Microsoft Visual C++ that often hides the size of 
> binaries behind a redistributable library.  I don't ever seek to make 
> binaries smaller at the expense of speed, but if I see a potential 
> saving that could be done automatically, I dive for it!

Keep in mind that the size differences (if more than a few percent) are 
usually not really compiler efficiency related, but more due to other 
reasons like framework architecture(RTTI, class registration), 
redistributable libraries (MSVCRT,QT) etc. Winapi binaries can be quite 
tight on FPC too. LCL is simply a bit more high level. Not just higher 
level than winapi but higher level than MFC too.

>> (and btw, if you are serious about these scenarios, drop all 
>> optimization work immediately, and start working on packages :-)
>
> I did try to start simple with the 'uComplex' unit, but concerns were 
> raised because I changed the formal parameters to 'const' and aligned 
> the complex type on x86-64 platforms so it can take advantage of XMM 
> registers better (which, given proper optimisation, would result in 
> both smaller code size and higher speed).  While I made sure that the 
> interfaces would not change for Pascal code, assembler code that calls 
> the functions (if it exists) might need to be changed slightly 
> (something Florian raised).  I'm not quite sure what the rules are 
> when it comes fo updating packages, other than the obvious one of not 
> breaking old code.


I tested the ucomplex with my ffts testcase yesterday btw. I saw no 
differences but it turned out that

- I work with a "single" based complex record ->  record is 8 byte, so 
doesn't really benefit from vectorcall.

- the fft unit doesn't have procedures with value parameters that are 
not inlined anyway.

- no vectorization whatsoever, so no add a complex in one step. Probably 
needs either vectorizer or intrinsics.

The only other thing I noticed is that it seems that the compiler only 
uses XMM0, occasionally XMM1  and extremely rarely XMM2. Seems there is 
no register variables for XMM floating point?

>
> I like working on optimisation because I have a morbid fascination 
> with the lowest level of the CPU and I feel well-suited for it, 
> although there are still some things I'm learning about it.
>
There is nothing wrong with that.  But it is wise to lot lose track of 
magnitudes.


More information about the fpc-devel mailing list