[fpc-devel] inline... and philosophy

Fri Nov 8 04:01:01 CET 2019

Hi everyone,

This is probably more rant-like than it's supposed to be, and maybe a 
bit philosophical, especially after I ran my mouth in the jump 
optimisation issue.

So I'm wondering what the future will be for this directive, since it 
seems to be very divisive.  On one side, its use is questionable with 
proposals for 'auto-inline', while on the other side some live by it for 
minor performance gains.  There's always going to be unexpected uses for 
things, and one point of contention that was recently raised is the way 
I used it on a function that I knew was only called in one particular 
place and would stay that way for the forseeable future.  Though I wrote 
a comment to explain what I did, apparently that's not a valid reason or 
use for it.

I guess that's also an interesting argument in terms of purity of 
purpose... do you go by what something was designed to do or what it 
actually can do?  There's an argument for both - the approach of purity 
yields cleaner code but it might suffer with performance, and the 
approach of practicality is potentially more efficient, but may be much 
more difficult to maintain.

Granted, regardless of one's stance, one does have to abide by a 
project's coding standards, even if they are a little nebulous sometimes 
and one doesn't often know what they are until they're violated.  I 
apologise for getting so upset over the presence of a handful of 
directives in my patch - to explain, there were a couple of new methods 
related to my jump optimisations that were marked as inline (so long as 
DEBUG_JUMP wasn't defined) - each function was only called in exactly 
one location with no intention of calling them elsewhere.  It was set up 
this way to compartmentalise the code, separating the jump optimisations 
from the main loop of the Peephole Optimizer.  Florian took issue with 
the inline directives despite the comments explaining why they were 
there, since it wasn't exactly to the spirit of what inline is used for 
(leaf functions that compile into only a few machine code 
instructions).  I wasn't too happy about the argument that one of the 
functions MAY get called a second time in the future - possibly, but you 
can remove the inline directive when that time comes (they'd be looking 
up the function header at the very least to see what the parameters are).

I guess sometimes, especially when you've worked on some closed-source 
applications, you come across the occasional 'black magic' (for a good 
real-world example, look up "fast inverse square root") that takes a 
University thesis to explain! Depending on where you work, some are 
quite strict with coding standards and how many changes you squeeze into 
a single commit, while others a bit more cavalier... which leads to the 
minefield that's legacy code!  For me personally, I've always pushed for 
performance, hence why I have no problem dropping into assembly language 
on some critical leaf functions, but I do want to go out of my way to 
explain what I'm doing so another programmer can follow what's happening 
and improve on it if needs be.

To go back to the original subject, what are the intentions with inline? 
I've gotten the impression from Florian that he doesn't like the 
directive and would much rather leave it to the compiler to auto-inline 
short functions, which I guess is fair, and can potentially make better 
judgement calls on a per-platform basis (e.g. vector addition on AMD64 
platforms collapses into a single line of machine code (not counting 
memory moving), whereas a platform that doesn't have vector registers 
may end up with something much longer).  As for my example, 
auto-inlining a long function that only gets called from one location, 
that is certainly possible if functions are reference-counted, but 
something tells me it would require whole-program optimisation, 
especially if said function is public (or protected).

I don't trust a compiler to produce the most optimal code, whether it be 
Free Pascal, Delphi or a C++ compiler, and if I am greatly concerned 
about execution speed, I look at the disassembly (I am aware that most 
mainstream programmers don't do this).  In most cases, it's inefficient 
high-level code that simple refactoring can fix (to use an exaggerated 
example, changing a sorting algorithm to use quicksort instead of 
bubblesort), while other times there is little you can do from the code 
alone.  I do try to help the compiler though by giving hints like the 
inline directive.  To branch into a semi-related issue of the compiler 
deciding what's best... when I get pure functions working, one could 
argue that the compiler should be smart enough to determine if a 
function is pure or not, and hence would have no need for a 'pure' 
directive - this is true, but would also be prohibitively slow, since 
the compiler would be analysing every node in every function with a fine 
tooth-comb.  That is partly my concern with auto-inline as well... 
though there's an upper limit on the number of nodes before it decides 
to not inline it, it still has to analyse the nodes one-by-one, and even 
counting them requires iterating through what is effectively a linked list.

I know originally that one of Turbo Pascal's selling points was the 
speed of its compiler compared to other compilers of the day. I don't 
know if compilation speed is in Free Pascal's manifesto, but I like to 
think it is, hence my acceptance of compiler hints like 'inline', 
whereas auto- features and anything requiring whole program optimisation 
(even though that's a Release Build thing) are anything but fast.  
Speaking of whole program optimisation, it always seems very fiddly to 
set up, to the point that the FPC bootstrapper needs a script to get it 
working.  Not exactly user-friendly and practically demands learning a 
separate skill to get working (at least I've struggled). Shouldn't the 
compiler have an option to do the two stages of whole program 
optimisation (generate the information files, then use the information 
files) in one sitting? It would take a long time to complete, but 
advertising it as a Release Build thing should prevent most people from 
running it accidentally.  I personally like seeing file sizes drop and 
execution speed climb in my compiled binaries!

Gareth aka. Kit