<HTML>
There are some good examples out there where being able to directly inline assembler routines has beneficial importance but where intrinsics don't cut it. One example is the following:<br>
<br>
function GetClockTics_RDTSC: QWord; assembler; nostackframe; inline;<br>
asm<br>
RDTSC<br>
{$ifdef CPUX64}<br>
SHL RDX, 32<br>
OR RAX, RDX<br>
{$endif}<br>
LFENCE<br>
end;<br>
<br>
Taken from: https://github.com/martok/fpc-microbench/blob/master/umicrobench.pas#L43><br>
<br>
Here, using intrinsics won't work because this is a function with several lines of assembler that have to be grouped in this way. You could use 4 intrinsics for such a thing, but it will quickly become messy and whatever project they appear in will be a nightmare to maintain (especially if GetClockTics_RDTSC needs to be called in several places and you replace each reference with the 4 intrinsics). More theoretical, but how would one program an intrinsic where the output is placed into two distinct registers that then have to be combined? True, a complier might be able to work that out, but it seems needlessly convoluted.<br>
<br>
The final point to raise... why make this function inline at all? Well, in this case, it's short enough that it would be eligible, but the overhead of calling a function to retrieve a fairly high-precision clock count adds a large element of unpredictability to the result (normally 5 cycles, but can be much more). So removing the need to call a different function and insert the assembler directly into the caller removes this uncertainty, while still maintaining the readability benefits of having the code in a separate subroutine.<br>
<br>
<div>Note, currently my proof-of-concept patch will refuse to inline the above function because the properties for LFENCE are simply Ch_All, which indicates that everything is modified (in actuality, LFENCE doesn't modify anything), so the compiler just assumes that non-volatile registers are trashed and hence mark the routine as "cannot inline". This can be fixed later on by correctly setting the properties for LFENCE (I'm not sure about RDTSC just yet).</div><div><br>
</div><div>I just want to show that there's definitely uses and benefits to the feature, and it's possible to do it safely - in my patch, it simply won't even attempt to inline assembler routines on non-Intel platforms, because the default implementation of the new "CanInline" method simply returns False... it also allows inline assembly to be implemented on another platform at a later date by overriding the methods in a descendant class and programming the necessary functionality.<br>
<br>
Of course, I offer anyone and everyone to attempt to break the compiler with my patch (it's attached to the first post of this particular thread)!<br>
<br>
Gareth aka. Kit<br>
</div> <br>
<br>
<span style="font-weight: bold;">On Tue 12/02/19 11:51 , "J. Gareth Moreton" gareth@moreton-family.com sent:<br>
</span><blockquote style="BORDER-LEFT: #F5F5F5 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT:0px; PADDING-LEFT: 5px; PADDING-RIGHT: 0px">
And this e-mail contains additional files to aid with testing and showcasing:<br>
<br>
- x86-inline-assembler-rtl-samples.patch - makes some internal RTL functions inline, like Trunc, since it's just a single instruction.<br>
- x86_inline_asm_test.pp - an x86-64 test program that has some hand-written assembler routines that have been inlined. The output should be identical whether or not "inline" is specified.<br>
<br>
Additionally, "tests/test/cg/tvectorcall3.pp" has an assembler function that is inlined, and this can be used to test if it works properly for vectorcall under Win64 and the regular System V ABI under Linux (it only uses the XMM registers, so the registers that are used for the parameters and the return value are identical).<br>
<br>
Gareth aka. Kit<br>
<br>
<br>
<span style="font-weight: bold;">On Tue 12/02/19 11:43 , "J. Gareth Moreton" gareth@moreton-family.com sent:<br>
</span><blockquote style="BORDER-LEFT: #F5F5F5 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT:0px; PADDING-LEFT: 5px; PADDING-RIGHT: 0px">
This is something I've been researching for a while, the ability to inline procedures that are written in pure assembler, and I've got something working pretty well and I'd like to showcase it.<br>
<br>
It's something that's garnered a little uncertainty from others because of how easy it is to introduce compiler bugs and offer support on other platforms, for example. Currently, my code is restricted to i386 and x86_64 because that's all I can actually test on. Nevertheless, the new virtual methods will easily allow extension to other CPUs while blocking "inline" on pure assembler routines by default.<br>
<br>
There are a number of restrictions on what can and can't be inlined, specifically:<br>
- The routine must have the "nostackframe" directive.<br>
- You cannot write to the stack.<br>
- No parameters or return values must be on the stack.<br>
- You cannot write to a non-volatile register.<br>
<br>
... among a few others. The internal procedure checks commands against the "InsProp" array (although a number of the opcodes just have "Ch_All" specified, which my code assumes to mean that everything is modified, hence it marks the procedure as 'cannot inline').<br>
<br>
I've so far built this on x86_64-win64, i386-win32 and x86_64-linux (I couldn't do i386-linux due to problems with missing tools, but it gets quite far in the compilation otherwise - if someone can do a more strenuous test, I'd be grateful) and done some tests with internal functions and some showcase functions, with promising results.<br>
<br>
Some other things that the inlining routine does:<br>
- If jumps and labels are found, new local ones are generated.<br>
- If RET is found, it is changed to a JMP and a new destination label generated at the end of the inserted code.<br>
- The markers at the beginning and end are removed, so peephole optimisation is actually performed on the inserted code - this is mostly to address some inefficiencies that crop up from moving parameters into the expected registers (e.g. the first integer parameter into RCX under Win64).<br>
<br>
To Florian, I know this work is somewhat unsanctioned, but I would like to show that it can be done in a way that's clean. Even if this is still a definite no, well, I managed to find and squash bug #35065!<br>
<br>
Another e-mail will follow this one that adds a patch that inlines some RTL routines, and a small test program.<br>
<br>
<div>Gareth aka. Kit</div><div><br>
</div><div>NOTE: Make sure you have the current trunk, because this code triggered Internal Error 200208181 due to a bug in "tai_cpu_abstract.ppuload" that was only fixed this morning.<br>
</div><br>
_______________________________________________<br>
fpc-devel maillist - <a href="javascript:top.opencompose('fpc-devel@lists.freepascal.org','','','')">fpc-devel@lists.freepascal.org</a><br>
<a target="_blank" href="<a href=" http:="" lists.freepascal.org="" cgi-bin="" mailman="" listinfo="" fpc-devel"="">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</a>"><span style="color: red;">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</span><br>
<br>
</blockquote>
_______________________________________________<br>
fpc-devel maillist - <a href="javascript:top.opencompose('fpc-devel@lists.freepascal.org','','','')">fpc-devel@lists.freepascal.org</a><br>
<a target="_blank" href="parse.php?redirect=<a href="http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</a>"><span style="color: red;">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</span></a><br>
<br>
</blockquote></HTML>