[fpc-devel] SSE operation addps doesn't work in subroutine
Michael Müller
mueller_michael at nikocity.de
Sun Jan 13 01:36:45 CET 2008
Hi,
today I tried to use SSE operations. But I couldn't get what I want.
Here is a simple code:
program SSETestsSimple;
{$mode objfpc}{$H+}
{$ALIGN 16}
uses
SysUtils,
MMX;
type
TSSESingle = array[0..3] of Single;
{$ASMMODE ATT}
procedure SSEAdd(const v1, v2: TSSESingle; out v_r: TSSESingle); //
assembler;
begin
writeln(Format('@v1: %s, @v2: %s, @v_r: %s', [HexStr(Addr(v1)),
HexStr(Addr(v2)), HexStr(Addr(v_r))]));
asm
movups v_r, %xmm0 // Just for testing
movups v2, %xmm0 // Just for testing
movups v1, %xmm0
// addps v2, %xmm0
movups %xmm0, v_r
end;
end;
var
A,
B,
C: TSSESingle;
I: Integer;
begin
writeln('is_sse_cpu: ', BoolToStr(is_sse_cpu, True));
for I := Low(A) to High(A) do begin
A[I] := 2 * I;
B[I] := 3 * I;
end;
writeln('Doing SSE in main program...');
writeln(Format('@A: %s, @B: %s, @C: %s', [HexStr(Addr(A)), HexStr
(Addr(B)), HexStr(Addr(C))]));
asm
movups A, %xmm0
addps B, %xmm0
movups %xmm0, C
end;
writeln('Works');
writeln('Doing SSE in subroutine...');
SSEAdd(A, B, C);
writeln('Works');
for I := Low(C) to High(C) do
write(C[I]:10:1);
writeln;
readln;
end.
I have three questions:
1.) Florian, in the mailing archive I found an anwser from 2004 in
which you say said that 'FPC doesn't align the stack properly to
sixteen byte boundaries currently.'. It still seems to be a problem
also with {$ALIGN 16}, right?
2.) When I create global variables so that I can influence the
position of the variables by adding dummy variables (which is not
needed anymore in this simple example) and I specify these variables
as const and out parameters so the global variables will be used
directly (see the output of the addresses) the assembler lines in the
main program work but in SSEAdd() the line addps creates a SIGSEGV.
Can sobody tell me why it doesn't work in the subroutine?
With addps line in SSEAdd():
(gdb) run
Starting program: /home/mm/Development/SSETests/ssetestssimple
is_sse_cpu: True
Doing SSE in main program...
@A: 08085BE0, @B: 08085BF0, @C: 08085C00
Works
Doing SSE in subroutine...
@v1: 08085BE0, @v2: 08085BF0, @v_r: 08085C00
Program received signal SIGSEGV, Segmentation fault.
SSEADD (V1={0, 2, 4, 6}, V2={0, 3, 6, 9}, V_R={0, 5, 10, 15})
at ssetestssimple.pas:23
23 addps v2, %xmm0
Btw: Without addps line in SSEAdd() when exiting the program (after
hitting Enter):
(gdb) run
Starting program: /home/mm/Development/SSETests/ssetestssimple
is_sse_cpu: True
Doing SSE in main program...
@A: 080AE290, @B: 080AE2A0, @C: 080AE2B0
Works
Doing SSE in subroutine...
@v1: 080AE290, @v2: 080AE2A0, @v_r: 080AE2B0
Works
0.0 5.0 10.0 15.0
Program received signal SIGSEGV, Segmentation fault.
0x08049a70 in fpc_ansistr_decr_ref ()
3.) When I try to use the assembler keyword for SSEAdd() I get the
compile error 'Asm: [movups xmmreg,reg32] invalid combination of
opcode and operands'. Why it doesn't work in this way?
I'm using
mm at grizzly:~/Development/SSETests$ fpc -i
Free Pascal Compiler version 2.2.0
Compiler Date : 2007/12/29
Compiler CPU Target: i386
...
Supported FPU instruction sets:
SOFT
X87
SSE
SSE2
SSE3
...
under Linux on a Core 2 Duo machine.
Regards
Michael
More information about the fpc-devel
mailing list