[fpc-devel] Problems with MM types (__m128 etc).
J. Gareth Moreton
gareth at moreton-family.com
Wed Apr 6 19:20:46 CEST 2022
Hi everyone,
I recently made a merge request that initally just fixed the incorrect
memory alignment for __m128 and similar types, but doing so revealed a
whole plethora of other bugs. First, when I fixed it, __m128 etc were
no longer recognised as a valid SIMD or aggregate type due to the wrong
alignment field being checked at one point, and some tests with
vectorcall revealed some bad code being generated in places.
This may have to be a long work in progress. I've also found another bug:
program m128test;
function Test3(V1, V2: __m128d): __m128d; vectorcall;
begin
Test3[0] := V1[0] + V2[0];
Test3[1] := V1[1] + V2[1];
end;
begin
end.
This will raise Internal error 200410108 under -O2 when compiled under
x86_64-win64. It only occurs with __m128d, not __m128 or __m128i
(although __m128i seems to have its own problems). My merge request
fixes the internal error, but produces bad code instead. When using
__m128 or __m128i instead, the following assembly language is produced
under -O2:
.section .text.n_p$m128test_$$_test1$__m128$__m128$$__m128,"ax"
.balign 16,0x90
.globl P$M128TEST_$$_TEST1$__M128$__M128$$__M128
P$M128TEST_$$_TEST1$__M128$__M128$$__M128:
.seh_proc P$M128TEST_$$_TEST1$__M128$__M128$$__M128
leaq -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
movq %rcx,%rax
movq %xmm1,(%rsp)
movq %xmm2,8(%rsp)
movq %xmm3,16(%rsp)
movq %xmm4,24(%rsp)
movss (%rsp),%xmm0
addss 16(%rsp),%xmm0
movss %xmm0,(%rax)
movss 4(%rsp),%xmm0
addss 20(%rsp),%xmm0
movss %xmm0,4(%rax)
leaq 40(%rsp),%rsp
ret
.seh_endproc
The fact that the same code is produced under __m128i, which is meant to
use integers, is worrying, but that aside, this code is clearly wrong
(ignoring the fact that the parameters are being passed on the stack
instead of through registers, and %rcx seems to refer to a hidden
parameter that's a pointer to the result). -sr reveals that V1 is at
(%rsp) and V2 is at 16(%rsp), but the first thing that happens is that
their contents are overwritten with undefined values (the movq
instructions). If the operands were reversed, this would seem more logical.
Gareth aka. Kit
P.S. I started making this fix to aid with vectorisation development.
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list