[fpc-devel] Successful implementation of inline support forpureassembler routines on x86

Sun Mar 24 13:09:32 CET 2019

Am 24.03.2019 um 11:33 schrieb J. Gareth Moreton:
> The main thing is the degree of control you have using pure assembler over intrinsics, and someone brought up that
> intrinsics don't give you good access to the FLAGS register. 

Juggling with the flags is rarely possible on x86 anyways as almost all instructions change them.

> Additionally, unless you do some rather untidy nested
> parameter chaining (calling an intrinsic and passing its result into another intrinsic, several layers deep), you don't
> have too much control over how the results are stored.  Normally not a terrible thing, but if you have a temporary value
> that you know will be discarded, you want it in a register and never stored on the stack, for example.

You do not ensure this with pure assembler either. It's even worse here: the instructions use always the same registers
so the code is very prone to do a lot of spilling. Inline pure assembler routines will result in most cases in far worse
code than intrinsics as the compiler cannot change the register usage in the inlined assembler.

> 
> There's also the issue of maintenance... writing intrinsics for every single possible instruction on every single
> platform and determining that they behave in the way they should.

Just look at the intrinsics branch, this can be easily automated.

> 
> I guess we have been spoilt in a way because Pascal has always supported a clean and efficient way to drop into assembly
> language if you so choose, and this is what I've gotten used to rather than the intrinsics of C++.  I don't like the
> idea of putting breakpoints on the intrinsics and opening up the Disassembly window just to check that the compiler
> isn't blindly storing temporary values on the stack.

... which will happen much more for inlined assembler as the compiler is less flexible regarding register usage.

> 
> If I had to give one final reason... there are already functions in the RTL that are written in pure assembly language
> that would easily benefit being inlined, such as SwapEndian and Trunc.  

Actually, trunc renders all these reasons void: it is inlined, if the code is compiled for an architecture supporting
the needed instructions (i386-win32, compiled with  -Cpcoreavx2 -Cavx2):

# [5] writeln(trunc(d));
	call	fpc_get_output
	movl	%eax,%ebx
	fldl	U_$P$PROGRAM_$$_D
	fisttpq	-8(%ebp)
	pushl	-4(%ebp)
	pushl	-8(%ebp)
	movl	%ebx,%edx
	movl	$0,%eax
	call	fpc_write_text_int64

This is not possible with inlined pure assembler routines. They would use the instruction set selected when the rtl was
compiled. trunc shows perfectly why intrinsics are the way to go.

Same could be done for SwapEndian.

> Otherwise they'd have to be rewritten to use
> intrinsics if anyone remembers to.
> 
> There is one other thing... intrinsics haven't been merged into the trunk yet, so we can't test them or determine if
> they are actually what we desire.

This is a not a valid reason. You can play with the svn branch.