[fpc-devel] Vectorisation development
J. Gareth Moreton
gareth at moreton-family.com
Sun May 18 22:01:49 CEST 2025
Hi everyone,
Just thought I'd give a heads-up on my latest mad experiments!
I'm currently working to see if I can improve auto-vectorisation within
the compiler. I'm using x86_64 as my starting point since SSE2 is
guaranteed to be present, but aiming to make it as cross-platform as
possible so it can be ported to AArch64 and the like.
Current status:
* Currently it compiles and works but it generally performs worse than
without auto-vectorisation because the compiler forces everything
into memory (its usual fall-back when it doesn't quite know what to
do wih a storage type).
* I'm using the ucomplex.pp unit as my test case while also telling it
to use 'vectorcall' in all of the routines. Because the complex type
is just two Doubles, this is perfectly suited for XMM.
* I tried to reuse LOC_SUBSETREG and LOC_CSUBSETREG for locations that
occupied specific lanes of an MM register, but this caused problems
since the type is designed only for integer registers, so I have
created a new LOC_MMLANE and LOC_CMMLANE type and associated
structure within the TLocation union, which are specifically
designed for MM registers (and so doesn't have to handle
bitpacking). This also allows me to write new methods like
a_loadmm_reg_lane instead of re-using and over-complicating existing
ones. (I also made sure to follow the convention of keeping
LOC_REFERENCE and LOC_CREFERENCE last).
* Currently I'm only supporting 128-bit MM types. 256-bit and above
will come at a later date.
* Currently auto-vectorisation is always attempted, but later will
disable it if it's not -O2 or -O3 (haven't decided which yet).
Kit
--
This email has been checked for viruses by Avast antivirus software.
www.avast.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20250518/ea923b49/attachment.htm>
More information about the fpc-devel
mailing list