[fpc-devel] using sse2 packed doubles
Vincent Snijders
vsnijders at quicknet.nl
Sun Oct 8 14:10:04 CEST 2006
Florian Klaempfl wrote:
> Vincent Snijders schrieb:
>
>> Daniƫl Mantione wrote:
>>
>>>
>>> Op Fri, 6 Oct 2006, schreef Micha Nelissen:
>>>
>>>
>>>> Vincent Snijders wrote:
>>>>
>>> You could also start an assembler implementation of the matrix unit.
>>> I suppose using it is allowed, and a Tvector2_double looks a lot like
>>> such a double2.
>>
>>
>> Unless the compiler somehow helps, inlining the assembler
>> implementation won't work and then the speedup might be lost again.
>
>
> I started to add vector pascal like support, currently only i386/x86_64
> are supported (no generic support). The whole (currently implemented)
> functionality is demonstrated by the following example. Please give some
> feedback if it allows benchmark speedups.
Thanks Florian, for starting the vector support.
I think this would help speedup in benchmarks. I cannot give real
estimates how much, maybe 20 % or so.
There are some problems still some bugs (or things not implemented).
Given the following program:
var
ad1,ad2,ad3 : array[0..1] of double;
begin
ad2[0] := 1;
ad2[1] := 3;
ad3[0] := 9;
ad2[1] := 12;
ad1:=ad2+ad3;
writeln(ad1[1]);
end.
It writes:
0.000000000000000E+000
Looking at the assembler, I see the ad1 in the writeln is read from
memory, but the ad1 is still only in the xmm0% register.
Further I encountered problems with the alignment.
Vincent
Vincent
More information about the fpc-devel
mailing list