[fpc-devel] using sse2 packed doubles

Sun Oct 8 14:10:04 CEST 2006

Florian Klaempfl wrote:
> Vincent Snijders schrieb:
> 
>> Daniël Mantione wrote:
>>
>>>
>>> Op Fri, 6 Oct 2006, schreef Micha Nelissen:
>>>
>>>
>>>> Vincent Snijders wrote:
>>>>
>>> You could also start an assembler implementation of the matrix unit. 
>>> I suppose using it is allowed, and a Tvector2_double looks a lot like 
>>> such a double2.
>>
>>
>> Unless the compiler somehow helps, inlining the assembler 
>> implementation won't work and then the speedup might be lost again.
> 
> 
> I started to add vector pascal like support, currently only i386/x86_64 
> are supported (no generic support). The whole (currently implemented) 
> functionality is demonstrated by the following example. Please give some 
> feedback if it allows benchmark speedups.

Thanks Florian, for starting the vector support.

I think this would help speedup in benchmarks. I cannot give real 
estimates how much, maybe 20 % or so.

There are some problems still some bugs (or things not implemented).

Given the following program:

var
   ad1,ad2,ad3 : array[0..1] of double;

begin
   ad2[0] := 1;
   ad2[1] := 3;
   ad3[0] := 9;
   ad2[1] := 12;
   ad1:=ad2+ad3;
   writeln(ad1[1]);
end.

It writes:
  0.000000000000000E+000

Looking at the assembler, I see the ad1 in the writeln is read from 
memory, but the ad1 is still only in the xmm0% register.

Further I encountered problems with the alignment.

Vincent

Vincent