[fpc-devel] The new XMM intrinsics
Sven Barth
pascaldragon at googlemail.com
Sat Jan 18 12:50:46 CET 2020
Am 16.01.2020 um 23:22 schrieb J. Gareth Moreton:
> Hey everyone,
>
> Maybe I'm being a bit pedantic with this, but must we abide by C/C++
> standards and go by the name __m128 etc. for the 128-bit data type?
> Being as how Pascal tended to go for more readable and BASIC-inspired
> names like Integer and Single, might it be better to name them TM128
> instead? If not that, then is it possible to add a union-like record
> type to the System unit or the inc files that contain all of the
> intrinsics?
I agree that the names with __xxx for the SIMD types is a bad choice. In
C/C++ they did this to avoid type conflicts (after all types with two
underscores are "reserved"), but in Pascal we don't have this problem as
the System types will be hidden by other units that declare similar
types, but can still be used by using System.TheType.
Thus I personally would prefer more Pascal-style names for these as well
(though I don't think that TXXX is good, because no other primitive type
starts with a T and that's what those types essentially are: primitive,
base types). So maybe simply M128 instead of __m128 would be better (and
analogous for the other types). This would be similar to the "new"
integer aliases: UInt8, Int8, Int32, UInt32, etc.
>
> My vectorcall tests (e.g. tests\test\cg\tvectorcall1.pp) have
> something like this:
>
> {$PUSH}
> {$CODEALIGN RECORDMIN=16}
> {$PACKRECORDS C}
> type
> TM128 = record
> case Byte of
> 0: (M128_F32: array[0..3] of Single);
> 1: (M128_F64: array[0..1] of Double);
> end;
> {$POP}
>
> Granted, given that __m128 will be automatically aligned, all of the
> codealign directives may not be necessary - for example:
>
> type
> TM128 = record
> case Byte of
> 0: (M128_F32: array[0..3] of Single);
> 2: (M128_F64: array[0..1] of Double);
> 3: (M128_Internal: __m128);
> end;
>
> The main thing I'm thinking about is that it's actually rather
> difficult to modify the elements of a variable of type __m128 directly
> in C/C++ because of the type being opaque and difficult to typecast
> sometimes (some compilers will treat it as an array, others will treat
> it as a record type like the above (Visual C++ does this), while
> others may not allow access to its elements at all). Often, I might
> want to map a 4-component vector with Single-type fields x, y, z and w
> to an aligned __m128 type, or Double-type fields Re and Im when
> dealing with complex numbers. That way, I can read from and write to
> them outside of intrinsic calls.
>
> I suppose I'm suggesting we introduce something more usable than what
> C has so people can actually use intrinsics more easily.
I don't know the plans of Florian, but I would very well imagine that
code like the following is going to be valid:
=== code begin ===
type
i: array[0..3] of LongInt;
m: __m128i;
begin
m := i;
// or
i := m;
end.
=== code end ===
With that working and type helpers one can implement the following:
=== code begin ===
type
TM128Helper = type helper for __m128
public type
TLongIntIndex = 0..3;
private type
TLongIntArray = array[TLongIntIndex] of LongInt;
private
procedure SetAsLongInt(aIndex: TLongIntIndex; aValue: LongInt); inline;
function GetAsLongInt(aIndex: TLongIntIndex): LongInt; inline;
public
property AsLongInt[Index: TLongIntIndex]: LongInt read GetAsLongInt
write SetAsLongInt;
end;
//
procedure TM128Helper.SetAsLongInt(aIndex: TLongIntIndex; aValue: LongInt);
begin
TLongIntArray(Self)[aIndex] := aValue;
end;
function TM128Helper.GetAsLongInt(aIndex: TLongIntIndex): LongInt;
begin
Result := TLongIntArray(Self)[aIndex];
end;
=== code end ===
This would allow to move those conversions from being handled by some
compiler magic to the runtime library.
In fact quite a bit of it is already working now, though the generated
assembly is not yet optimal (but the feature is still work in progress
after all):
=== code begin ===
program tmmtest;
{$mode objfpc}
{$modeswitch typehelpers}
type
TM128Helper = type helper for __m128
public type
TLongIntIndex = 0..3;
private type
TLongIntArray = array[0..3] of LongInt;
private
procedure SetAsLongInt(aIndex: TLongIntIndex; aValue: LongInt);
inline; vectorcall;
function GetAsLongInt(aIndex: TLongIntIndex): LongInt; inline;
vectorcall;
public
property AsLongInt[Index: TLongIntIndex]: LongInt read GetAsLongInt
write SetAsLongInt;
end;
procedure TM128Helper.SetAsLongInt(aIndex: TLongIntIndex; aValue:
LongInt); vectorcall;
var
arr: TLongIntArray;
begin
x86_movups(@arr[0], Self);
arr[aIndex] := aValue;
// triggers internal error 200310081
//Self := x86_movups(@arr[0]);
end;
function TM128Helper.GetAsLongInt(aIndex: TLongIntIndex): LongInt;
vectorcall;
var
arr: TLongIntArray;
begin
x86_movups(@arr[0], Self);
Result := arr[aIndex];
end;
procedure Test;
var
m: __m128;
i: LongInt;
begin
m.AsLongInt[0] := 42;
i := m.AsLongInt[0];
end;
begin
Test;
end.
=== code end ===
The generated assembly for Test is this:
=== code begin ===
# Var m located at rbp-16, size=OS_M128
# Var i located at rbp-20, size=OS_S32
# [42] m.AsLongInt[0] := 42;
leaq -36(%rbp),%rax
movdqa -16(%rbp),%xmm0
movups %xmm0,(%rax)
movl $42,-36(%rbp)
# [43] i := m.AsLongInt[0];
leaq -36(%rbp),%rax
movdqa -16(%rbp),%xmm0
movups %xmm0,(%rax)
movl -36(%rbp),%eax
movl %eax,-20(%rbp)
# [44] end;
=== code end ===
Regards,
Sven
More information about the fpc-devel
mailing list