[fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
J. Gareth Moreton
gareth at moreton-family.com
Sun Oct 27 11:23:14 CET 2019
The following passes everything through XMM0:
#include<cmath>
#include<emmintrin.h>
doubleMod(__m128dz)
{
returnsqrt((z[0]*z[0])+(z[1]*z[1]));
}
intmain()
{
__m128dz;
z[0] = 0; z[1] = 1;
doubled = Mod(z);
}
I will admit that it's very fiddly to get right. All of my attempts to
map an anonymous struct to __m128d via a union (so you could call z.re
and z.im rather than access the array elements) were unsuccessful. C++
is not very friendly with vector types and you have to go out of your
way to get the compiler to be efficient with them, but the System V ABI
does support utilising the full vector registers.
It took me a while to work out how passing a record type with two
single-precision elements into just XMM0 is correct, but this is because
the record type as a whole has a size of eight bytes, and gets passed as
a single argument of class SSE. If the function parameters are instead
two separate arguments, then they get passed individually through XMM0
and XMM1. It seems you have to interpret this document very literally
to get it right: https://www.uclibc.org/docs/psABI-x86_64.pdf
Gareth aka. Kit
On 27/10/2019 08:13, Florian Klämpfl wrote:
> Am 23.10.19 um 22:36 schrieb J. Gareth Moreton:
>> So I did a bit of reading after finding the "mpx-linux64-abi.pdf"
>> document. As I suspected, the System V ABI is like vectorcall when
>> it comes to using the XMM registers... only the types __m128,
>> __float128 and __Decimal128 use the "SSEUP" class and hence use the
>> entire register. The types are opaque, but both their size and
>> alignment are 16 bytes, so I think anything that abides by those
>> rules can be considered equivalent.
>>
>> If the complex type is unaligned, the two fields get their own XMM
>> register. If aligned, they both go into %xmm0. At least that is
>> what I gathered from reading the document - it's a little unclear
>> sometimes.
>
> I briefly tested with god bolt (https://godbolt.org/): records of two
> double are passed in two xmm registers regardless of the alignment,
> two floats (so single) are passed in one xmm register.
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20191027/f9954ac7/attachment-0001.html>
More information about the fpc-devel
mailing list