[fpc-devel] SSE operation addps doesn't work in subroutine

Michael Müller mueller_michael at nikocity.de
Sun Jan 13 01:36:45 CET 2008


Hi,

today I tried to use SSE operations. But I couldn't get what I want.

Here is a simple code:

program SSETestsSimple;

{$mode objfpc}{$H+}

{$ALIGN 16}

uses
   SysUtils,
   MMX;

type
   TSSESingle = array[0..3] of Single;

{$ASMMODE ATT}

procedure SSEAdd(const v1, v2: TSSESingle; out v_r: TSSESingle); // 
assembler;
begin
   writeln(Format('@v1: %s, @v2: %s, @v_r: %s', [HexStr(Addr(v1)),  
HexStr(Addr(v2)), HexStr(Addr(v_r))]));
   asm
     movups v_r, %xmm0 // Just for testing
     movups v2, %xmm0  // Just for testing
     movups v1, %xmm0
//    addps  v2, %xmm0
     movups %xmm0, v_r
   end;
end;

var
   A,
   B,
   C: TSSESingle;
   I: Integer;
begin
   writeln('is_sse_cpu: ', BoolToStr(is_sse_cpu, True));

   for I := Low(A) to High(A) do begin
     A[I] := 2 * I;
     B[I] := 3 * I;
   end;

   writeln('Doing SSE in main program...');
   writeln(Format('@A: %s, @B: %s, @C: %s', [HexStr(Addr(A)), HexStr 
(Addr(B)), HexStr(Addr(C))]));
   asm
     movups A, %xmm0
     addps  B, %xmm0
     movups %xmm0, C
   end;
   writeln('Works');

   writeln('Doing SSE in subroutine...');
   SSEAdd(A, B, C);
   writeln('Works');

   for I := Low(C) to High(C) do
     write(C[I]:10:1);
   writeln;

   readln;
end.

I have three questions:

1.) Florian, in the mailing archive I found an anwser from 2004 in  
which you say said that 'FPC doesn't align the stack properly to  
sixteen byte boundaries currently.'. It still seems to be a problem  
also with {$ALIGN 16}, right?

2.) When I create global variables so that I can influence the  
position of the variables by adding dummy variables (which is not  
needed anymore in this simple example) and I specify these variables  
as const and out parameters so the global variables will be used  
directly (see the output of the addresses) the assembler lines in the  
main program work but in SSEAdd() the line addps creates a SIGSEGV.  
Can sobody tell me why it doesn't work in the subroutine?

With addps line in SSEAdd():

(gdb) run
Starting program: /home/mm/Development/SSETests/ssetestssimple
is_sse_cpu: True
Doing SSE in main program...
@A: 08085BE0, @B: 08085BF0, @C: 08085C00
Works
Doing SSE in subroutine...
@v1: 08085BE0, @v2: 08085BF0, @v_r: 08085C00

Program received signal SIGSEGV, Segmentation fault.
SSEADD (V1={0, 2, 4, 6}, V2={0, 3, 6, 9}, V_R={0, 5, 10, 15})
     at ssetestssimple.pas:23
23	    addps  v2, %xmm0

Btw: Without addps line in SSEAdd() when exiting the program (after  
hitting Enter):

(gdb) run
Starting program: /home/mm/Development/SSETests/ssetestssimple
is_sse_cpu: True
Doing SSE in main program...
@A: 080AE290, @B: 080AE2A0, @C: 080AE2B0
Works
Doing SSE in subroutine...
@v1: 080AE290, @v2: 080AE2A0, @v_r: 080AE2B0
Works
        0.0       5.0      10.0      15.0


Program received signal SIGSEGV, Segmentation fault.
0x08049a70 in fpc_ansistr_decr_ref ()

3.) When I try to use the assembler keyword for SSEAdd() I get the  
compile error 'Asm: [movups xmmreg,reg32] invalid combination of  
opcode and operands'. Why it doesn't work in this way?

I'm using
mm at grizzly:~/Development/SSETests$ fpc -i
Free Pascal Compiler version 2.2.0

Compiler Date      : 2007/12/29
Compiler CPU Target: i386

...

Supported FPU instruction sets:
   SOFT
   X87
   SSE
   SSE2
   SSE3

...

under Linux on a Core 2 Duo machine.

Regards

Michael



More information about the fpc-devel mailing list