[fpc-devel] The new XMM intrinsics

Sat Jan 18 12:50:46 CET 2020

Am 16.01.2020 um 23:22 schrieb J. Gareth Moreton:
> Hey everyone,
>
> Maybe I'm being a bit pedantic with this, but must we abide by C/C++ 
> standards and go by the name __m128 etc. for the 128-bit data type?  
> Being as how Pascal tended to go for more readable and BASIC-inspired 
> names like Integer and Single, might it be better to name them TM128 
> instead?  If not that, then is it possible to add a union-like record 
> type to the System unit or the inc files that contain all of the 
> intrinsics?

I agree that the names with __xxx for the SIMD types is a bad choice. In 
C/C++ they did this to avoid type conflicts (after all types with two 
underscores are "reserved"), but in Pascal we don't have this problem as 
the System types will be hidden by other units that declare similar 
types, but can still be used by using System.TheType.

Thus I personally would prefer more Pascal-style names for these as well 
(though I don't think that TXXX is good, because no other primitive type 
starts with a T and that's what those types essentially are: primitive, 
base types). So maybe simply M128 instead of __m128 would be better (and 
analogous for the other types). This would be similar to the "new" 
integer aliases: UInt8, Int8, Int32, UInt32, etc.

>
> My vectorcall tests (e.g. tests\test\cg\tvectorcall1.pp) have 
> something like this:
>
> {$PUSH}
> {$CODEALIGN RECORDMIN=16}
> {$PACKRECORDS C}
> type
>   TM128 = record
>     case Byte of
>       0: (M128_F32: array[0..3] of Single);
>       1: (M128_F64: array[0..1] of Double);
>   end;
> {$POP}
>
> Granted, given that __m128 will be automatically aligned, all of the 
> codealign directives may not be necessary - for example:
>
> type
>   TM128 = record
>     case Byte of
>       0: (M128_F32: array[0..3] of Single);
>       2: (M128_F64: array[0..1] of Double);
>       3: (M128_Internal: __m128);
>   end;
>
> The main thing I'm thinking about is that it's actually rather 
> difficult to modify the elements of a variable of type __m128 directly 
> in C/C++ because of the type being opaque and difficult to typecast 
> sometimes (some compilers will treat it as an array, others will treat 
> it as a record type like the above (Visual C++ does this), while 
> others may not allow access to its elements at all).  Often, I might 
> want to map a 4-component vector with Single-type fields x, y, z and w 
> to an aligned __m128 type, or Double-type fields Re and Im when 
> dealing with complex numbers. That way, I can read from and write to 
> them outside of intrinsic calls.
>
> I suppose I'm suggesting we introduce something more usable than what 
> C has so people can actually use intrinsics more easily.

I don't know the plans of Florian, but I would very well imagine that 
code like the following is going to be valid:

=== code begin ===

type
   i: array[0..3] of LongInt;
   m: __m128i;
begin
   m := i;
   // or
   i := m;
end.

=== code end ===

With that working and type helpers one can implement the following:

=== code begin ===

type
   TM128Helper = type helper for __m128
   public type
     TLongIntIndex = 0..3;
   private type
     TLongIntArray = array[TLongIntIndex] of LongInt;
   private
     procedure SetAsLongInt(aIndex: TLongIntIndex; aValue: LongInt); inline;
     function GetAsLongInt(aIndex: TLongIntIndex): LongInt; inline;
   public
     property AsLongInt[Index: TLongIntIndex]: LongInt read GetAsLongInt 
write SetAsLongInt;
   end;

//

procedure TM128Helper.SetAsLongInt(aIndex: TLongIntIndex; aValue: LongInt);
begin
   TLongIntArray(Self)[aIndex] := aValue;
end;

function TM128Helper.GetAsLongInt(aIndex: TLongIntIndex): LongInt;
begin
   Result := TLongIntArray(Self)[aIndex];
end;

=== code end ===

This would allow to move those conversions from being handled by some 
compiler magic to the runtime library.

In fact quite a bit of it is already working now, though the generated 
assembly is not yet optimal (but the feature is still work in progress 
after all):

=== code begin ===

program tmmtest;

{$mode objfpc}
{$modeswitch typehelpers}

type
   TM128Helper = type helper for __m128
   public type
     TLongIntIndex = 0..3;
   private type
     TLongIntArray = array[0..3] of LongInt;
   private
    procedure SetAsLongInt(aIndex: TLongIntIndex; aValue: LongInt); 
inline; vectorcall;
    function GetAsLongInt(aIndex: TLongIntIndex): LongInt; inline; 
vectorcall;
  public
    property AsLongInt[Index: TLongIntIndex]: LongInt read GetAsLongInt 
write SetAsLongInt;
  end;

procedure TM128Helper.SetAsLongInt(aIndex: TLongIntIndex; aValue: 
LongInt); vectorcall;
var
   arr: TLongIntArray;
begin
   x86_movups(@arr[0], Self);
   arr[aIndex] := aValue;
   // triggers internal error 200310081
   //Self := x86_movups(@arr[0]);
end;

function TM128Helper.GetAsLongInt(aIndex: TLongIntIndex): LongInt; 
vectorcall;
var
   arr: TLongIntArray;
begin
   x86_movups(@arr[0], Self);
   Result := arr[aIndex];
end;

procedure Test;
var
   m: __m128;
   i: LongInt;
begin
   m.AsLongInt[0] := 42;
   i := m.AsLongInt[0];
end;

begin
   Test;
end.

=== code end ===

The generated assembly for Test is this:

=== code begin ===

# Var m located at rbp-16, size=OS_M128
# Var i located at rbp-20, size=OS_S32
# [42] m.AsLongInt[0] := 42;
     leaq    -36(%rbp),%rax
     movdqa    -16(%rbp),%xmm0
     movups    %xmm0,(%rax)
     movl    $42,-36(%rbp)
# [43] i := m.AsLongInt[0];
     leaq    -36(%rbp),%rax
     movdqa    -16(%rbp),%xmm0
     movups    %xmm0,(%rax)
     movl    -36(%rbp),%eax
     movl    %eax,-20(%rbp)
# [44] end;

=== code end ===

Regards,
Sven