[fpc-devel] Successful implementation of inlinesupportforpureassembler routines on x86

J. Gareth Moreton gareth at moreton-family.com
Mon Mar 18 11:36:55 CET 2019


 I think that is the fundamental difference between intrinsics and
reusable, inlinable assembly language... you would use intrinsics mostly to
use one or two special instructions in a convenient, concise way, while
assembly language is a "give me all the power!" request.  Admittedly
though my patch does allow the peephole optimizer to touch the assembly
language in the inlined procedure, mostly to help optimize parameter and
result passing.
 Using assembly language always carries risk - that's pretty much what a
programmer signs up for if they choose to use it.  The ability to inline
particular assembler routines is a means to give as much power and trust as
possible to the programmer.  If the program breaks because of what they've
coded, it's on them to fix it.  Besides, when it comes to debugging, I've
found that I comment out all of my "inline" directives because of how even
regular inlined procedures cause problems with the code stepper.  The only
thing I want to minimise as much as possible when it comes to developing
the compiler is the chance of code behaving differently depending on
whether it's inlined or not.
 I'll admit that I'm stubborn and sceptical with certain things, not least
because I know the Free Pascal Compiler has limitations that are not easily
addressed.  For a simple example, if you try to do "x div 10" and "x mod
10" close to each other, the compiler will write code to perform each
calculation separately rather than being more efficient and performing the
division once (using a multiplication trick for speed) and calculating the
remainder via "R := x - (Q * 10)", where Q is the result of "x div 10". 
It's even worse if the divisor is a variable.  Something like the
Microsoft Visual C++ compiler will spot this and combine the calculations
into a single DIV instruction, using the results in EAX and EDX as
appropriate, while Free Pascal will perform the calculations separately. 
It's not easy to peephole-optimize either because the second DIV
instruction may use a different register for the divisor.

 Gareth aka. Kit

 P.S. This is where my work on a "deep optimizer" comes into play, as it
attempts to perform data flow analysis on the used registers, although only
works on MOV instructions currently.  It works fairly well currently, but
would work better if the step where virtual registers are converted into
real ones is performed last (or right before the post-peephole optimization
stage), since this step also allocates stack space that might not be needed
if optimization is able to remove the use of a register completely and
hence free it up for something that would otherwise spend its time on the
stack.

 On Mon 18/03/19 11:05 , Marco van de Voort core at pascalprogramming.org
sent:

  Op 3/18/2019 om 8:00 AM schreef Sven Barth via fpc-devel:
       J. Gareth Moreton  schrieb am So., 17. März 2019, 23:27:

   And I believe that this is the advantage of intrinsics, because here the
compiler *can* decide to use a different register. Especially if the
compiler supports instruction scheduling and such.  At work I've worked
with AES-NI and I definitively preferred to work with the intrinsics and
didn't have to care about what registers to use, because the compiler and
optimizer took care of that. 

(well, better double check output, it is not always ideal)

I've seen nice examples in simd lib
(http://ermig1979.github.io/Simd/index.html [2]), where they use generics
to bundle intrinsics into blocks, and then reuse them multiple times, e.g.
3 times for the first, bulk and last line of an image.
    That is something that Pascal should stand for: ease of use. Assembler
is not easy to use.  

If something is generic enough to be an intrinsic, it should be an
intrinsic and as secured as much as possible. 

Inlinable assembler however is something to get some of that defining power
also in the user's hand. It doesn't really matter that there are border
cases, as long as they can be described, since assembler is intrinsically
unportable anyway. But having something like that is quite important I
think. Though examples come more from my embedded, and less from my PC work
(even though I use AVX2 there. Intrinsics would be better for many cases)
  _______________________________________________
 fpc-devel maillist - fpc-devel at lists.freepascal.org [3]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[4]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

 

Links:
------
[1] mailto:gareth at moreton-family.com
[2] http://ermig1979.github.io/Simd/index.html
[3] mailto:fpc-devel at lists.freepascal.org
[4] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20190318/f3b3d5dd/attachment.html>


More information about the fpc-devel mailing list