<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>The Microsoft ABI is a bit restrictive when it comes to record
types; as described <a moz-do-not-send="true"
href="https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019">here</a>,
"Structs and unions of size 8, 16, 32, or 64 bits, and __m64
types, are passed as if they were integers of the same size." So
unfortunately, a single-precision complex number is treated as a
64-bit structure and passed as an integer. The System V ABI, on
the other hand, would pass the two entries through the lower 64
bits of XMM0. Vectorcall, theoretically, should put the two
components into XMM0 and XMM1, because the complex type would be
considered a "homogeneous vector aggregate" (with floats as
1-dimensional vectors).</p>
<p>I think the overhead that comes with issues such as this is the
reason why vectorcall was developed in the first place.<br>
</p>
<p>Gareth aka. Kit</p>
<div class="moz-cite-prefix">On 12/11/2019 16:05, Marco van de Voort
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:d2b46e1b-c16e-3738-82a2-668bdb41e066@pascalprogramming.org">
<br>
Op 12/11/2019 om 16:08 schreef J. Gareth Moreton:
<br>
<blockquote type="cite">It's true. With VMULSS, only the first
parameter (third parameter under Intel notation) can be an
address (source: Intel(R) 64 and IA-32 Architectures Software
Development Manual, Volume 2B, Page 4-154).
<br>
<br>
I'll see if I can work in that optimisation for the commutative
operations (+ and *) at some point from the node side.
<br>
</blockquote>
<br>
Thanks.
<br>
<br>
Another tidbit I noticed while playing with (elements of) the
complex patch is that if I set the elementsize to double
(re:double;im:double) that with vectorcall loads all data into
registers.
<br>
<br>
However if I make it single, (iow the tcomplex is 8-byte), the
records are loaded into integer registers, and the compiler stores
them to the stack and then reloads them.
<br>
<br>
This matters less for me since it won't vectorize anyway (see
inline and philosophy thread) I'll change this routine to
assembler I think, accepting a pointer and load and store from
that thread.
<br>
<br>
_______________________________________________
<br>
fpc-devel maillist - <a class="moz-txt-link-abbreviated" href="mailto:fpc-devel@lists.freepascal.org">fpc-devel@lists.freepascal.org</a>
<br>
<a class="moz-txt-link-freetext" href="https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel">https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</a>
<br>
<br>
</blockquote>
</body>
</html>