[fpc-devel] Successful implementation of inline support forpureassembler routines on x86

Sun Mar 24 11:33:17 CET 2019

 The main thing is the degree of control you have using pure assembler over
intrinsics, and someone brought up that intrinsics don't give you good
access to the FLAGS register.  Additionally, unless you do some rather
untidy nested parameter chaining (calling an intrinsic and passing its
result into another intrinsic, several layers deep), you don't have too
much control over how the results are stored.  Normally not a terrible
thing, but if you have a temporary value that you know will be discarded,
you want it in a register and never stored on the stack, for example.
 I know you keep saying the compiler should be smart enough to determine
that, but it's not.  It's not even smart enough to merge nearby "div" and
"mod" instructions with the same denominator, and putting individual cases
in the peephole optimizer can only go so far before you have to redesign it
from the ground up (things like "full tree optimization"), and there comes
a point where you can put so many possible optimisations into a compiler
that it becomes prohibitively slow or take up too much memory... the main
problem is that optimisation as a whole is NP-complete.  You'll never get
everything, and assembly language is the only way to be sure you're making
the most efficient code in specialised situations where speed is of
paramount importance.

 There's also the issue of maintenance... writing intrinsics for every
single possible instruction on every single platform and determining that
they behave in the way they should.
 I guess we have been spoilt in a way because Pascal has always supported a
clean and efficient way to drop into assembly language if you so choose,
and this is what I've gotten used to rather than the intrinsics of C++.  I
don't like the idea of putting breakpoints on the intrinsics and opening up
the Disassembly window just to check that the compiler isn't blindly
storing temporary values on the stack.
 If I had to give one final reason... there are already functions in the
RTL that are written in pure assembly language that would easily benefit
being inlined, such as SwapEndian and Trunc.  Otherwise they'd have to be
rewritten to use intrinsics if anyone remembers to.

 There is one other thing... intrinsics haven't been merged into the trunk
yet, so we can't test them or determine if they are actually what we
desire.
 Gareth aka. Kit

 On Sun 24/03/19 09:45 , Florian Klämpfl florian at freepascal.org sent:
 Am 18.03.2019 um 02:57 schrieb Ben Grasset: 
 > On Sun, Mar 17, 2019 at 1:57 PM Florian Klämpfl  wrote: 
 > 
 > 
 > How is it better than intrinsics support (similiar to gcc/icc etc.)? 
 > 
 > 
 > Well, it wouldn't be better than a literal equivalent to those
intriniscs, if that's what we're talking about. By which 
 > I mean, like, say how in Clang/GCC (or languages such as Rust that use
LLVM), if you do _mm_loadl_pd or whatever, that 
 > translates not to a function call but directly to the "inlined"
assembler instructions (at least in release builds.)  

 Yes, that's what I mean. So far I have not seen a single advantage of
inlining pure assembler routines over such intrinsics. 
 _______________________________________________ 
 fpc-devel maillist - fpc-devel at lists.freepascal.org [1] 
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel 

Links:
------
[1] mailto:fpc-devel at lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20190324/5c923961/attachment.html>