[fpc-devel] Double-check Linux 64-bit SSE return value handling

J. Gareth Moreton gareth at moreton-family.com
Sun Dec 3 01:01:52 CET 2017


Hi everyone,

This is not something I'm able to check myself because I don't have Linux or a suitable virtual machine 
installed, but according to another assembler programmer, Linux is possibly doing something weird if the 
return type is an array of 4 Singles (stored as a record type):

Consider the following code (Intel syntax):

type
  TGLZVector4f = packed array[0..3] of Single;

class operator TGLZVector4f.+(constref A, B: TGLZVector4f): TGLZVector4f; register; assembler; nostackframe;
asm
  MOVUPS XMM0,[A]
  MOVUPS XMM1,[B]
  ADDPS  XMM0,XMM1
  {$ifdef UNIX}
    {$ifdef CPU64}
    MOVHLPS XMM1, XMM0
    {$else}
    MOVUPS [RESULT], XMM0
    {$endif}
  {$else}
    MOVUPS [RESULT], XMM0
  {$endif} 
end;

This was some vector code written by dicepd on the forum ( 
http://forum.lazarus.freepascal.org/index.php/topic,32741.msg267708.html#msg267708 ), and he had to do a bit 
of a workaround for 64-bit Linux (UNIX) because specifying "nostackframe" caused problems, namely it removed 
the function epilogue which, according to dicepd, was this (AT&T syntax):

movups %xmm0,-0x10(%rbp)
movq   -0x10(%rbp),%xmm0
movq   -0x8(%rbp),%xmm1

(MOVHLPS XMM1, XMM0 is a more efficient way of doing the same thing)

Obviously some optimisation is taking place because the compiler is recognising the structure as an array of 
primitive types and putting the first two into the same XMM register (rather than into 4 separate XMM 
registers), but then it proceeds to put the upper two Singles into a new XMM register.  The rules of the 
x86_64 ABI - http://refspecs.linuxbase.org/elf/x86_64-abi-0.21.pdf - page 15-17, specify that the equivalent 
of a __float128 or __m128 should be split into two classes, with the lower half being SSE and the upper half 
being SSEUP, hence the two can be merged into a single XMM register.

I may be barking up the wrong tree, but it looks like a minor violation of the ABI, or at the very least, a 
source of unnecessary inefficiency.

Yours faithfully,

J. Gareth "Kit" Moreton



More information about the fpc-devel mailing list