[fpc-devel] Double-check Linux 64-bit SSE return value handling
J. Gareth Moreton
gareth at moreton-family.com
Sun Dec 3 01:01:52 CET 2017
Hi everyone,
This is not something I'm able to check myself because I don't have Linux or a suitable virtual machine
installed, but according to another assembler programmer, Linux is possibly doing something weird if the
return type is an array of 4 Singles (stored as a record type):
Consider the following code (Intel syntax):
type
TGLZVector4f = packed array[0..3] of Single;
class operator TGLZVector4f.+(constref A, B: TGLZVector4f): TGLZVector4f; register; assembler; nostackframe;
asm
MOVUPS XMM0,[A]
MOVUPS XMM1,[B]
ADDPS XMM0,XMM1
{$ifdef UNIX}
{$ifdef CPU64}
MOVHLPS XMM1, XMM0
{$else}
MOVUPS [RESULT], XMM0
{$endif}
{$else}
MOVUPS [RESULT], XMM0
{$endif}
end;
This was some vector code written by dicepd on the forum (
http://forum.lazarus.freepascal.org/index.php/topic,32741.msg267708.html#msg267708 ), and he had to do a bit
of a workaround for 64-bit Linux (UNIX) because specifying "nostackframe" caused problems, namely it removed
the function epilogue which, according to dicepd, was this (AT&T syntax):
movups %xmm0,-0x10(%rbp)
movq -0x10(%rbp),%xmm0
movq -0x8(%rbp),%xmm1
(MOVHLPS XMM1, XMM0 is a more efficient way of doing the same thing)
Obviously some optimisation is taking place because the compiler is recognising the structure as an array of
primitive types and putting the first two into the same XMM register (rather than into 4 separate XMM
registers), but then it proceeds to put the upper two Singles into a new XMM register. The rules of the
x86_64 ABI - http://refspecs.linuxbase.org/elf/x86_64-abi-0.21.pdf - page 15-17, specify that the equivalent
of a __float128 or __m128 should be split into two classes, with the lower half being SSE and the upper half
being SSEUP, hence the two can be merged into a single XMM register.
I may be barking up the wrong tree, but it looks like a minor violation of the ABI, or at the very least, a
source of unnecessary inefficiency.
Yours faithfully,
J. Gareth "Kit" Moreton
More information about the fpc-devel
mailing list