[fpc-devel] Vectorization

J. Gareth Moreton gareth at moreton-family.com
Wed Feb 7 14:10:07 CET 2018

 Hi John,

 I am on the mailing list.  I don't actually have write access to the SVN
repository, so I can only submit patches for review and testing, especially
as Florian has his own plans in the works.

 Currently, vectorcall only really benefits assembly language programmers
because the vectorisation system in the Free Pascal Compiler is shaky at
best, although slowly improving.  The idea is that small blocks of
floating-point numbers, the most frequent example being a 4-component
vector, are passed into a function more efficiently in a single register
designed to handle such data.

 The advantage of the 'vector stuff' is that you can perform the same
mathematical routine on multiple units of data at once - a simple example
would be adding two vectors together.  Traditionally, you would have to
load, compute and store each floating-point component sequentially, one at
a time, while SIMD (Single Instruction, Multiple Data) collapses this into
a single set of instructions, thereby giving a very large performance
boost.  The vectorisation system in the compiler attempts to take
advantage of this with, for example, unwinding for-loops so it can
calculate several iterations at once (not always possible if the outcome of
one iteration depends on the ones run immediately prior).

 On Linux, 'vectorcall' will be ignored, while on 64-bit Windows, there
should be no difference in performance if none of the parameters or the
return value are vector types, but if they are, there's a performance gain
because of using a CPU register instead of passing it on the stack, among
other things.

 To answer your questions:

 1) For the most part, AMD processors follow the same standard set of
opcodes as Intel.  There may be some minor differences in performance and
optimisation at the lowest level, but normally you don't have to worry
about it.

 2) Any Intel system running a 64-bit version of Windows is guaranteed to
have at least SSE2 available, because Windows simply refuses to install
otherwise.  For Linux I imagine this is the same thing, especially as the
standard for passing parameters into procedures under Linux utilises the
relevant registers for floating-point arguments.

 3) Basically, the implentation of Linux' calling convention in the Free
Pascal Compiler wasn't perfect.  For the most part, it conforms to the
standard, but doesn't handle vector data properly as per the System V ABI
- the specification).  Generally, it only passes parameters of type Single
into XMM registers in blocks of two, rather than four as is supported by
SSE.  If you had a vector of length 4 (very common in graphical
applications), the compiler would split its values across two XMM
registers, even if the data was originally aligned on a 16-byte boundary
(it's far faster to access data aligned this way).  This was inefficient
in that it required twice as many reads and writes to pass the data, but
also it used up an extra register, of which there are only 8 available
(more modern processors have 16 or sometimes 32, but the System V ABI, and
vectorcall, don't permit these to be used for parameters).  If you have a
function that passes a lot of parameters, you have to pass them on the
stack if there are no more XMM registers free.  Also, using as few
registers as possible leaves more free for the compiler to play with.

 It's a bit complicated, I'm afraid, but I hope that helps.

 Gareth aka. Kit

 On Wed 07/02/18 11:44 , John Lee johnelee0 at gmail.com sent:
 Hi Kit/Gareth
 Thanks for this work - I've been following all your changes to win64.
Can't say I understand all of the vector ones. Be good to commit to trunk
asap - tho' gather your are waiting for some Florian mods.
 Just few qs - maybe these could be clarified in comments/example/tests
 1) How does this work on amd processors that have vector stuff?
 2)exactly which processors amd/intel have this stuff? 
 3) don't understand why this stuff doesn't work w/o mods on linux, which I
think you say somewhere.
 Thanks again john
 PS assume you are on devel list tho' doing a reply to this email in gmail
copies you and devel? Dohhhhh.


  On 7 February 2018 at 08:23, J. Gareth Moreton  wrote:
 Hi everyone,

 After a lot of work, I have implemented 'vectorcall' into Win64, and made
a patch for Lazarus to recognise
 the keyword in the IDE and highlight it accordingly.

 FPC vectorcall patch:

 https://bugs.freepascal.org/view.php?id=32781 [2]

 Lazarus vectorcall support patch:

 https://bugs.freepascal.org/view.php?id=33134 [3]

 The vectorcall patch also contains the code in the patch for issue #27870,
since they share a lot in common.
 So far, I have confirmed that FPC and Lazarus successfully compile on
Win32 and Win64, but I know for a fact
 that the code changes affect Linux 64-bit as well in that the SSEUP_CLASS
is now properly supported
 (vectorcall reuses the System V ABI code for convenience and
compatibility), so FPC's implementation of the
 System V ABI should now properly support 128-bit SSE vectors.

 Note that 256-bit and 512-bit vectors are currently disabled in the code,
since the compiler does not fully
 support vectors of this size yet, and Florian is working on this himself.

 I have provided 3 test programs in #32781 that should compile under both
Win64 and Linux 64-bit (it will
 throw a custom $FATAL error if it's not one of these two platforms) in
order to test correct code production
 and register allocation.  However, testing will have to be very extensive
for this addition.

 I hope this will serve the x86-64 assembly programmers well - have fun!

 Gareth aka. Kit
 fpc-devel maillist  -  fpc-devel at lists.freepascal.org [4]
 http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [5]

 fpc-devel maillist - fpc-devel at lists.freepascal.org [6]


[1] mailto:gareth at moreton-family.com
[2] https://bugs.freepascal.org/view.php?id=32781
[3] https://bugs.freepascal.org/view.php?id=33134
[4] mailto:fpc-devel at lists.freepascal.org
[5] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[6] mailto:fpc-devel at lists.freepascal.org
[7] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180207/4f03fb26/attachment.html>

More information about the fpc-devel mailing list