[fpc-devel] using sse2 packed doubles

Daniël Mantione daniel.mantione at freepascal.org
Sun Oct 8 15:18:12 CEST 2006



Op Sun, 8 Oct 2006, schreef Vincent Snijders:

> Dani=EBl Mantione wrote:
> > =

> > Op Sat, 7 Oct 2006, schreef Florian Klaempfl:
> > =

> > =

> > > Vincent Snijders schrieb:
> > > =

> > > I started to add vector pascal like support, currently only
> > > i386/x86_64 are
> > > supported (no generic support). The whole (currently implemented)
> > > functionality is demonstrated by the following example. Please give
> > > some
> > > feedback if it allows benchmark speedups.
> > =

> > =

> > To get a large speedup, I think you should instead of making pairs of
> > doubles, do the pixels in parallel. I.e. in this benchmark, a row is 30=
00
> > pixels wide, so, make an array of 3000 doubles, and do the operation wi=
th
> > arrays. With proper compiler optimization, it should be possible to
> > achieve speeds close to 2 flops a clock cycle.
> > =

> =

> The 'problem' in this benchmark is that the number of iterations of the i=
nner
> loop isn't fixed, but can vary between 1 and 50. If you pair two doubles,=
 the
> change you can break the loop for all elements of the vector before itera=
tion
> 50 is bigger than when you combine 3000 elements.

You are right. How about doing it in blocks of 8x8 pixels? The =

high iteration loops are concentrated close to the borders of
the set, so for most blocks the iteration can then be ended early.

Dani=EBl


More information about the fpc-devel mailing list