[fpc-devel] using sse2 packed doubles
Vincent Snijders
vsnijders at quicknet.nl
Sun Oct 8 15:20:55 CEST 2006
Daniël Mantione wrote:
>
> Op Sun, 8 Oct 2006, schreef Vincent Snijders:
>
>
>>Daniël Mantione wrote:
>>
>>>Op Sat, 7 Oct 2006, schreef Florian Klaempfl:
>>>
>>>
>>>
>>>>Vincent Snijders schrieb:
>>>>
>>>>I started to add vector pascal like support, currently only
>>>>i386/x86_64 are
>>>>supported (no generic support). The whole (currently implemented)
>>>>functionality is demonstrated by the following example. Please give
>>>>some
>>>>feedback if it allows benchmark speedups.
>>>
>>>
>>>To get a large speedup, I think you should instead of making pairs of
>>>doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000
>>>pixels wide, so, make an array of 3000 doubles, and do the operation with
>>>arrays. With proper compiler optimization, it should be possible to
>>>achieve speeds close to 2 flops a clock cycle.
>>>
>>
>>The 'problem' in this benchmark is that the number of iterations of the inner
>>loop isn't fixed, but can vary between 1 and 50. If you pair two doubles, the
>>change you can break the loop for all elements of the vector before iteration
>>50 is bigger than when you combine 3000 elements.
>
>
> You are right. How about doing it in blocks of 8x8 pixels? The
> high iteration loops are concentrated close to the borders of
> the set, so for most blocks the iteration can then be ended early.
For starters I was thinking about blocks of 1x2 pixels ;-). The current
hardware doesn't allow any more parallelism anyway. Or am I making a
mistake in my thinking now?
Vincent
More information about the fpc-devel
mailing list