[fpc-devel] Difficulty in specifying record alignment
J. Gareth Moreton
gareth at moreton-family.com
Mon Oct 21 00:57:42 CEST 2019
Hi everyone,
I'm trying to make some optimisation improvements to UComplex so the
compiler can take advantage of SSE2 or AVX features without needing to
write specialised code (other than using the "vectorcall" directive
under Win64). I am having some difficulty though.
The record type "complex" is defined as follows:
*type *complex = *record*
re : real;
im : real;
*end*;
(Real is equivalent to Double on x86_64)
This also corresponds with how a complex number is defined for Extended
Pascal. Currently, when compiled under x86_64-win64, the fields are
placed on 8-byte boundaries, but because the type as a whole is also on
an 8-byte boundary (not 16-byte), the compiler cannot take advantage of
the XMM registers when passing such a construct as a parameter or return
value, and hence has to pass it by reference. For high-speed scientific
programming, this quickly adds up to a notable penalty. For example,
the compiled assembly language for adding together two complex numbers
on x86_64-win64 ("Z := Z + X;"):
movsd U_$P$COMPLEX_$$_Z(%rip),%xmm0
addsd U_$P$COMPLEX_$$_X(%rip),%xmm0
movsd %xmm0,40(%rsp)
movsd U_$P$COMPLEX_$$_Z+8(%rip),%xmm0
addsd U_$P$COMPLEX_$$_X+8(%rip),%xmm0
movsd %xmm0,48(%rsp)
movq 40(%rsp),%rax
movq %rax,U_$P$COMPLEX_$$_Z(%rip)
movq 48(%rsp),%rax
movq %rax,U_$P$COMPLEX_$$_Z+8(%rip)
Even if the reads and writes to memory cannot be removed, treating the
complex data type as an aligned array of doubles should be able to yield
far more efficient code (might require some compiler quirks so it
detects the component-wise addition in the inlined + operator for the
complex type):
movapd U_$P$COMPLEX_$$_Z(%rip),%xmm0
addpd U_$P$COMPLEX_$$_X(%rip),%xmm0
movapd %xmm0,U_$P$COMPLEX_$$_Z(%rip)
The problem here is that there's no practical way to force the entire
record's alignment onto a 16-byte boundary (a requirement for
"vectorcall") without also snapping each individual field to such a
boundary. Strictly speaking, I don't think the 16-byte boundary is a
requirement for the System V ABI (the Unix calling convention for 64-bit
Intel processors), and there are unaligned move instructions to
accommodate for this (which have traditionally been slightly slower than
the aligned counterparts), but currently the Free Pascal Compiler
demands the alignment, mainly because of shared compiler code between
Windows and non-Windows builds.
The only way to enforce a construct where the record is on a 16-byte
boundary but the two 8-byte fields are packed is to use an array
element; e.g:
{$push}
{$codealign RECORDMIN=16}
*type* complex = *record*
part: *array*[0..1] of real;
*end*;
{$pop}
Mapping "re" to "part[0]" and "im" to "part[1]" using a union is
impossible because "im" will be put on the next 16-byte boundary and be
its own separate entity. Other constructs such as nested unions are
possible, but this will break backward compatibility with code that uses
the uComplex unit.
A while ago I requested a means to specify an alignment on a per-type
basis so it is easier for third-party programmers to take advantage of
the extra efficiency brought upon by vectorcall and the System V ABI:
https://bugs.freepascal.org/view.php?id=32780 - this effectively boils
down to being able to define something akin to the following:
*type *complex = *record*
re : real;
im : real;
*end*/{$ifdef CPUX86_64}/ *align* 16/{$endif CPUX86_64}/;
It was assigned to Maciej last year, but hasn't seen any progress since.
If not that alignment feature, is there any other way to cleanly enforce
a 16-byte boundary for such a packed type without having to completely
redesign it to the point that it breaks compatibility?
Gareth aka. Kit
P.S. I suppose what I'm getting at is that taking advantage of the
System V ABI's vectorising capabilities is incredibly fiddly and, even
if you know how the compiler works internally, is not a guarantee of
getting it to work. Vectorcall was always fiddly because of the
alignment requirement, but any cross-platform solution should make it
much easier to get right.
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20191020/3c869820/attachment.html>
More information about the fpc-devel
mailing list