[fpc-pascal] FPC and SIMD intrinsics
Alecu Ștefan-Iulian
overanalytcl at gmail.com
Sun Apr 14 01:19:10 CEST 2024
Hello!
I am interested in making a high-performance project which involves a
lot of math, which is why I am interested in using SIMD (AVX2) on x86_64
(and for fun as well, if I'm honest). I am coming mainly from the C and
C++ world where one has intrinsics (such as `_mm256_add_epi64`, to give
an example from the Intel® Intrinsics Guide). I am most familiar with
GCC (and to a lesser extent to Clang and ICC), where one can access
these intrinsics through headers such as <immintrin.h>. Is there a Free
Pascal equivalent for that?
I am well aware I can use asm blocks, but some intrinsics do more than
one instruction and over on C it's the compiler's responsibility to find
the best instruction for a given intrinsic.
Basically, can I directly implement `_mm256_add_epi64` so they're
equivalent to doing the same thing in C? If not, what would be the best
course of action to make wrappers for these intrinsics? I tried this:
```
program AVX2Example;
{$mode objfpc}{$H+}{$asmmode intel}
uses
SysUtils;
type
__m256i = packed array[0..3] of int64;
function _mm256_loadu_si256(src: __m256i): __m256i; assembler;
asm
vmovdqu ymm0, ymmword ptr [src]
vmovdqa [Result], ymm0
end;
function _mm256_add_epi64(a, b: __m256i): __m256i; assembler;
asm
vmovdqa ymm0, [a]
vmovdqa ymm1, [b]
vpaddq ymm0, ymm0, ymm1
vmovdqa [Result], ymm0
end;
var
a: __m256i = (1, 2, 3, 4);
b: __m256i = (5, 6, 7, 8);
a1, a2: __m256i;
res: __m256i;
e: int64;
begin
a1 := _mm256_loadu_si256(a);
a2 := _mm256_loadu_si256(b);
res := _mm256_add_epi64(a1, a2);
for e in res do
begin
Write(e, ' ');
end;
Writeln;
end.
```
but it only works half of the time, so something is wrong.
Kind regards,
Stefan.
More information about the fpc-pascal
mailing list