[fpc-devel] Successful implementation of inline support forpure assembler routines on x86

Sun Mar 17 19:57:25 CET 2019

On 17/03/2019 18:18, J. Gareth Moreton wrote:
> Part of it may be preference but I think
> some people like the fine degree of
> control that assembly language offers,

That is absolutely correct. That is both its strength and its weakness. 
The weakness is that it is impossible to integrate such code safely in 
compiler-generated code without the programmer saying exactly what that 
code does (in terms of constraints, like GCC supports: 
https://gcc.gnu.org/onlinedocs/gcc/Constraints.html ).

E.g., at least the following issues exist with your patch, but that's 
not because your code is of bad quality. It's simply that it is 
impossible to fully analyse inline assembly and determine it to be safe:
* you forbid modifying the stack, but loading the stack pointer in 
another register and then modifying the stack through this other 
register is not caught (or e.g. loading a value from memory that happens 
to point to the stack)
* you skip over db/dw/dd/dq directives, even though these can also be 
used to encode instructions (often ones not (yet) supported by the 
compiler). There may be more assembler directives like that that could 
influence the code.

Additionally, your remark regarding memory barriers is a bit dangerous: 
these instructions must not only act as memory barriers to the 
processor, but also to the compiler. I.e., the compiler must not be 
allowed to optimise certain things across such a barrier (e.g. (re)move 
memory reads or writes), because then the barrier will no longer serve 
its purpose. That is the main reason why marking them as "they change 
everything" should probably stay for the foreseeable future.

The performance overhead of memory barriers is also many times greater 
than that of a call/return, so I don't think it will actually matter 
that much (although it would still be better to get rid of the 
call/return than not, of course -- provided the compiler can be told to 
not optimise anything across it).

I thought I sent a mail in the previous thread about this, but I can't 
find it anymore so maybe I did not. What I though I said before, is that 
I think that inlining pure assembler functions is something that should 
never be done. A pure assembler function, especially with 
"nostackframe", is the programmer literally telling the compiler "you 
have absolutely no business messing with this code".

On the other hand, if you have a regular function with an inline 
assembler block, then inlining becomes a whole lot more feasible. 
Especially if you add support for GCC-like constraints. Then there is no 
issue with the assembler code expecting arguments in certain registers, 
possibly returning in the middle of the block, messing up the stack etc, 
because you simply cannot do that in this scenario. This means you don't 
have to (try) to check for this either. And there is already rudimentary 
support for specifying constraints in this case (which registers get 
modified).

It would be much less of a quick win (e.g. because the compiler does not 
support passing variables in registers to assembler blocks right now), 
but in the long run it would be fully supportable and much more 
maintainable. It would also require much less target-specific support, 
because it would not require trying to figure out what the assembler 
block is doing.

That said: for optimal performance, you will usually still want 
intrinsics rather than inline assembly, simply because the compiler can 
then be taught to reason about them, and perform constant propagation 
through them (and potentially eliminate them altogether).

Jonas