[fpc-devel] using sse2 packed doubles

Sun Oct 8 10:42:55 CEST 2006

Daniël Mantione schrieb:
> 
> Op Sat, 7 Oct 2006, schreef Florian Klaempfl:
> 
>> Vincent Snijders schrieb:
>>> Daniël Mantione wrote:
>>>> Op Fri, 6 Oct 2006, schreef Micha Nelissen:
>>>>
>>>>
>>>>> Vincent Snijders wrote:
>>>>>
>>>> You could also start an assembler implementation of the matrix unit.
>>>> I suppose using it is allowed, and a Tvector2_double looks a lot like
>>>> such a double2.
>>> Unless the compiler somehow helps, inlining the assembler implementation
>>> won't work and then the speedup might be lost again.
>> I started to add vector pascal like support, currently only i386/x86_64 are
>> supported (no generic support). The whole (currently implemented)
>> functionality is demonstrated by the following example. Please give some
>> feedback if it allows benchmark speedups.
> 
> To get a large speedup, I think you should instead of making pairs of 
> doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000 
> pixels wide, so, make an array of 3000 doubles, and do the operation with 
> arrays. With proper compiler optimization, it should be possible to 
> achieve speeds close to 2 flops a clock cycle.

This is planned to be done, but currently it only spites an ie out :)