[fpc-devel] using sse2 packed doubles

Daniël Mantione daniel.mantione at freepascal.org
Sun Oct 8 15:18:12 CEST 2006



Op Sun, 8 Oct 2006, schreef Vincent Snijders:

> Daniël Mantione wrote:
> > 
> > Op Sat, 7 Oct 2006, schreef Florian Klaempfl:
> > 
> > 
> > > Vincent Snijders schrieb:
> > > 
> > > I started to add vector pascal like support, currently only
> > > i386/x86_64 are
> > > supported (no generic support). The whole (currently implemented)
> > > functionality is demonstrated by the following example. Please give
> > > some
> > > feedback if it allows benchmark speedups.
> > 
> > 
> > To get a large speedup, I think you should instead of making pairs of
> > doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000
> > pixels wide, so, make an array of 3000 doubles, and do the operation with
> > arrays. With proper compiler optimization, it should be possible to
> > achieve speeds close to 2 flops a clock cycle.
> > 
> 
> The 'problem' in this benchmark is that the number of iterations of the inner
> loop isn't fixed, but can vary between 1 and 50. If you pair two doubles, the
> change you can break the loop for all elements of the vector before iteration
> 50 is bigger than when you combine 3000 elements.

You are right. How about doing it in blocks of 8x8 pixels? The 
high iteration loops are concentrated close to the borders of
the set, so for most blocks the iteration can then be ended early.

Daniël


More information about the fpc-devel mailing list