[fpc-devel] Difficulty in specifying record alignment
Florian Klämpfl
florian at freepascal.org
Mon Oct 21 21:00:49 CEST 2019
Am 21.10.19 um 00:57 schrieb J. Gareth Moreton:
> Hi everyone,
>
> I'm trying to make some optimisation improvements to UComplex so the
> compiler can take advantage of SSE2 or AVX features without needing to
> write specialised code (other than using the "vectorcall" directive
> under Win64). I am having some difficulty though.
>
> The record type "complex" is defined as follows:
>
> *type *complex = *record*
> re : real;
> im : real;
> *end*;
>
> (Real is equivalent to Double on x86_64)
>
> This also corresponds with how a complex number is defined for Extended
> Pascal. Currently, when compiled under x86_64-win64, the fields are
> placed on 8-byte boundaries, but because the type as a whole is also on
> an 8-byte boundary (not 16-byte), the compiler cannot take advantage of
> the XMM registers when passing such a construct as a parameter or return
> value, and hence has to pass it by reference. For high-speed scientific
> programming, this quickly adds up to a notable penalty. For example,
> the compiled assembly language for adding together two complex numbers
> on x86_64-win64 ("Z := Z + X;"):
>
> movsd U_$P$COMPLEX_$$_Z(%rip),%xmm0
> addsd U_$P$COMPLEX_$$_X(%rip),%xmm0
> movsd %xmm0,40(%rsp)
> movsd U_$P$COMPLEX_$$_Z+8(%rip),%xmm0
> addsd U_$P$COMPLEX_$$_X+8(%rip),%xmm0
> movsd %xmm0,48(%rsp)
> movq 40(%rsp),%rax
> movq %rax,U_$P$COMPLEX_$$_Z(%rip)
> movq 48(%rsp),%rax
> movq %rax,U_$P$COMPLEX_$$_Z+8(%rip)
>
> Even if the reads and writes to memory cannot be removed, treating the
> complex data type as an aligned array of doubles should be able to yield
> far more efficient code (might require some compiler quirks so it
> detects the component-wise addition in the inlined + operator for the
> complex type):
>
> movapd U_$P$COMPLEX_$$_Z(%rip),%xmm0
> addpd U_$P$COMPLEX_$$_X(%rip),%xmm0
> movapd %xmm0,U_$P$COMPLEX_$$_Z(%rip)
>
> The problem here is that there's no practical way to force the entire
> record's alignment onto a 16-byte boundary (a requirement for
> "vectorcall") without also snapping each individual field to such a
> boundary. Strictly speaking, I don't think the 16-byte boundary is a
> requirement for the System V ABI (the Unix calling convention for 64-bit
> Intel processors),
The stack is 16 byte aligned, aligning data is up to the compiler.
> and there are unaligned move instructions to
> accommodate for this (which have traditionally been slightly slower than
> the aligned counterparts), but currently the Free Pascal Compiler
> demands the alignment, mainly because of shared compiler code between
> Windows and non-Windows builds.
Each target can have its own aligment requirements.
>
> The only way to enforce a construct where the record is on a 16-byte
> boundary but the two 8-byte fields are packed is to use an array
> element; e.g:
>
> {$push}
> {$codealign RECORDMIN=16}
> *type* complex = *record*
> part: *array*[0..1] of real;
> *end*;
> {$pop}
>
> Mapping "re" to "part[0]" and "im" to "part[1]" using a union is
> impossible because "im" will be put on the next 16-byte boundary and be
> its own separate entity. Other constructs such as nested unions are
> possible, but this will break backward compatibility with code that uses
> the uComplex unit.
>
> A while ago I requested a means to specify an alignment on a per-type
> basis so it is easier for third-party programmers to take advantage of
> the extra efficiency brought upon by vectorcall and the System V ABI:
> https://bugs.freepascal.org/view.php?id=32780 - this effectively boils
> down to being able to define something akin to the following:
>
> *type *complex = *record*
> re : real;
> im : real;
> *end*/{$ifdef CPUX86_64}/ *align* 16/{$endif CPUX86_64}/;
>
> It was assigned to Maciej last year, but hasn't seen any progress since.
>
> If not that alignment feature, is there any other way to cleanly enforce
> a 16-byte boundary for such a packed type without having to completely
> redesign it to the point that it breaks compatibility?
What's the problem with
{$push}
{$codealign RECORDMIN=16}
type complex = record
re : real;
im : real;
end;
{$pop}
?
More information about the fpc-devel
mailing list