[fpc-pascal] FPC and SIMD intrinsics

Alecu Ștefan-Iulian overanalytcl at gmail.com
Sun Apr 14 01:19:10 CEST 2024


Hello!

I am interested in making a high-performance project which involves a
lot of math, which is why I am interested in using SIMD (AVX2) on x86_64
(and for fun as well, if I'm honest). I am coming mainly from the C and
C++ world where one has intrinsics (such as `_mm256_add_epi64`, to give
an example from the Intel® Intrinsics Guide). I am most familiar with
GCC (and to a lesser extent to Clang and ICC), where one can access
these intrinsics through headers such as <immintrin.h>. Is there a Free
Pascal equivalent for that?

I am well aware I can use asm blocks, but some intrinsics do more than
one instruction and over on C it's the compiler's responsibility to find
the best instruction for a given intrinsic.

Basically, can I directly implement `_mm256_add_epi64` so they're
equivalent to doing the same thing in C? If not, what would be the best
course of action to make wrappers for these intrinsics? I tried this:

```
program AVX2Example;

{$mode objfpc}{$H+}{$asmmode intel}

uses
   SysUtils;

type
   __m256i = packed array[0..3] of int64;

function _mm256_loadu_si256(src: __m256i): __m256i; assembler;
asm
     vmovdqu ymm0, ymmword ptr [src]
     vmovdqa [Result], ymm0
end;

function _mm256_add_epi64(a, b: __m256i): __m256i; assembler;
asm
     vmovdqa ymm0, [a]
     vmovdqa ymm1, [b]
     vpaddq ymm0, ymm0, ymm1
     vmovdqa [Result], ymm0
end;

var
   a: __m256i = (1, 2, 3, 4);
   b: __m256i = (5, 6, 7, 8);

   a1, a2: __m256i;
   res: __m256i;
   e: int64;
begin
   a1 := _mm256_loadu_si256(a);
   a2 := _mm256_loadu_si256(b);

   res := _mm256_add_epi64(a1, a2);

   for e in res do
   begin
     Write(e, ' ');
   end;
   Writeln;

end.
```

but it only works half of the time, so something is wrong.

Kind regards,
Stefan.


More information about the fpc-pascal mailing list