[fpc-devel] Problems with MM types (__m128 etc).

Wed Apr 6 19:20:46 CEST 2022

Hi everyone,

I recently made a merge request that initally just fixed the incorrect 
memory alignment for __m128 and similar types, but doing so revealed a 
whole plethora of other bugs.  First, when I fixed it, __m128 etc were 
no longer recognised as a valid SIMD or aggregate type due to the wrong 
alignment field being checked at one point, and some tests with 
vectorcall revealed some bad code being generated in places.

This may have to be a long work in progress.  I've also found another bug:

program m128test;

function Test3(V1, V2: __m128d): __m128d; vectorcall;
begin
   Test3[0] := V1[0] + V2[0];
   Test3[1] := V1[1] + V2[1];
end;

begin
end.

This will raise Internal error 200410108 under -O2 when compiled under 
x86_64-win64.  It only occurs with __m128d, not __m128 or __m128i 
(although __m128i seems to have its own problems).  My merge request 
fixes the internal error, but produces bad code instead.  When using 
__m128 or __m128i instead, the following assembly language is produced 
under -O2:

.section .text.n_p$m128test_$$_test1$__m128$__m128$$__m128,"ax"
     .balign 16,0x90
.globl    P$M128TEST_$$_TEST1$__M128$__M128$$__M128
P$M128TEST_$$_TEST1$__M128$__M128$$__M128:
.seh_proc P$M128TEST_$$_TEST1$__M128$__M128$$__M128
     leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
     movq    %rcx,%rax
     movq    %xmm1,(%rsp)
     movq    %xmm2,8(%rsp)
     movq    %xmm3,16(%rsp)
     movq    %xmm4,24(%rsp)
     movss    (%rsp),%xmm0
     addss    16(%rsp),%xmm0
     movss    %xmm0,(%rax)
     movss    4(%rsp),%xmm0
     addss    20(%rsp),%xmm0
     movss    %xmm0,4(%rax)
     leaq    40(%rsp),%rsp
     ret
.seh_endproc

The fact that the same code is produced under __m128i, which is meant to 
use integers, is worrying, but that aside, this code is clearly wrong 
(ignoring the fact that the parameters are being passed on the stack 
instead of through registers, and %rcx seems to refer to a hidden 
parameter that's a pointer to the result). -sr reveals that V1 is at 
(%rsp) and V2 is at 16(%rsp), but the first thing that happens is that 
their contents are overwritten with undefined values (the movq 
instructions).  If the operands were reversed, this would seem more logical.

Gareth aka. Kit

P.S. I started making this fix to aid with vectorisation development.

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus