<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>To show current progress, the .section
.text.n_system.math.vectors$_$tvector3d_$__$$_$plus$tvector3d$tvector3d$$tvector3d,"ax"
routine ("class operator TVector3D.+(const aVector1, aVector2:
TVector3D): TVector3D;") on x86_64-win64 - before (this performs a
component-wise addition of two 4-component vectors... inputs are
pointed to by %rdx and %r8, and the result is pointed to by %rcx):</p>
<p> movss (%rdx),%xmm0<br>
addss (%r8),%xmm0<br>
movss %xmm0,(%rcx)<br>
movss 4(%rdx),%xmm0<br>
addss 4(%r8),%xmm0<br>
movss %xmm0,4(%rcx)<br>
movss 8(%rdx),%xmm0<br>
addss 8(%r8),%xmm0<br>
movss %xmm0,8(%rcx)<br>
movss 12(%rdx),%xmm0<br>
addss 12(%r8),%xmm0<br>
movss %xmm0,12(%rcx)<br>
ret<br>
</p>
<p>After:</p>
<p> movups (%rdx),%xmm0<br>
addps (%r8),%xmm0<br>
movups %xmm0,(%rcx)<br>
ret<br>
</p>
<p>(Note that this unit is NOT suitable for 3D graphics because the
W-coordinate is treated the same as X, Y and Z rather than being
kept at 1, say)<br>
</p>
<p>It's not perfect though. When vectorcall or the System V ABI is
concerned, it can produce worse code because of constantly needing
to break up the vector to manipulate individual components, and
there may be some unnecessary reads and writes between the stack
and XMM registers, but I'm working on it, bit by it.</p>
<p>My current branch can be found here:
<a class="moz-txt-link-freetext" href="https://gitlab.com/CuriousKit/optimisations/-/commits/ucomplex-x86-vector">https://gitlab.com/CuriousKit/optimisations/-/commits/ucomplex-x86-vector</a><br>
</p>
<p>Kit<br>
</p>
<div class="moz-cite-prefix">On 23/08/2024 17:56, J. Gareth Moreton
via fpc-devel wrote:<br>
</div>
<blockquote type="cite"
cite="mid:82c22f9f-b3b1-411f-af0d-45e06359b187@moreton-family.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<p>Hi everyone,</p>
<p>So I'm getting ready to showcase my current vector work to
others. I do have a question though...</p>
<p>Currently the feature is locked behind "-Sv", since this is
specificially "support vector processing" and the feature is
still experimental and inefficient in places, but is this the
right approach? I ask because the -S switches are specifically
syntax options, not code generation options (I do wonder exactly
what syntax it enables). Also, at least with the "make" script,
it skips whole program optimisation and some of the packages.</p>
<p>Should I use a compiler definition instead like
"-dX86_VECTORS"? That way, the feature can easily be turned
off.<br>
</p>
<p>Kit<br>
</p>
<div class="moz-cite-prefix">On 21/08/2024 15:59, J. Gareth
Moreton via fpc-devel wrote:<br>
</div>
<blockquote type="cite"
cite="mid:f2645374-bc88-4ea5-835d-3c08490aac19@moreton-family.com">
<meta http-equiv="content-type"
content="text/html; charset=UTF-8">
<p>Hi everyone,</p>
<p>Just thought I'd give a heads-up on what's happening with me
and the compiler improvements. Also, I've been busy with
contract work and have just had some minor surgery, so I'm not
running on all cylinders currently.</p>
<ul>
<li>Still waiting on administrator comments and feedback on my
assembly-level CSE feature (a couple of years old now) and
the first part of pure functions. Both of these should be
ready to merge unless someone found a bug that breaks things
(someone did find some examples with pure functions which
have since been fixed).<br>
</li>
<li>Haven't solved the SEH unwinding problem on aarch64-win64
yet. This is quite a tough one!</li>
<li>Also working on vectorisation for x86_64 platforms. I've
got it working on win64, and can vectorise two-operand
commutative operations like addition and multiplication,
although some of the generated code is less than optimal
(unnecessarily copying vectors to the stack). Linux (and
other OSes that use the System V ABI) is taking a bit longer
since it stores pairs of floats in single XMM registers even
without vectorisation code, and some of the internal
procedures can't properly handle these if the desire is to
combine a pair of these such registers (so 4 singles) into a
single XMM vector, especially where shuffling is involved.</li>
</ul>
<p>I'll let you know the progress.</p>
<p>Kit<br>
</p>
<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br>
<table style="border-top: 1px solid #D3D4DE;">
<tbody>
<tr>
<td style="width: 55px; padding-top: 13px;"><a
href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient"
target="_blank" moz-do-not-send="true"><img
src="https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif"
alt="" style="width: 46px; height: 29px;"
moz-do-not-send="true" width="46" height="29"></a></td>
<td
style="width: 470px; padding-top: 12px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Virus-free.<a
href="https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient"
target="_blank" style="color: #4453ea;"
moz-do-not-send="true">www.avast.com</a></td>
</tr>
</tbody>
</table>
<a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1" moz-do-not-send="true"> </a></div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
fpc-devel maillist - <a
class="moz-txt-link-abbreviated moz-txt-link-freetext"
href="mailto:fpc-devel@lists.freepascal.org"
moz-do-not-send="true">fpc-devel@lists.freepascal.org</a>
<a class="moz-txt-link-freetext"
href="https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel"
moz-do-not-send="true">https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</a>
</pre>
</blockquote>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
fpc-devel maillist - <a class="moz-txt-link-abbreviated" href="mailto:fpc-devel@lists.freepascal.org">fpc-devel@lists.freepascal.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel">https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel</a>
</pre>
</blockquote>
</body>
</html>