[fpc-devel] using sse2 packed doubles
Daniël Mantione
daniel.mantione at freepascal.org
Sun Oct 8 15:18:12 CEST 2006
Op Sun, 8 Oct 2006, schreef Vincent Snijders:
> Dani=EBl Mantione wrote:
> > =
> > Op Sat, 7 Oct 2006, schreef Florian Klaempfl:
> > =
> > =
> > > Vincent Snijders schrieb:
> > > =
> > > I started to add vector pascal like support, currently only
> > > i386/x86_64 are
> > > supported (no generic support). The whole (currently implemented)
> > > functionality is demonstrated by the following example. Please give
> > > some
> > > feedback if it allows benchmark speedups.
> > =
> > =
> > To get a large speedup, I think you should instead of making pairs of
> > doubles, do the pixels in parallel. I.e. in this benchmark, a row is 30=
00
> > pixels wide, so, make an array of 3000 doubles, and do the operation wi=
th
> > arrays. With proper compiler optimization, it should be possible to
> > achieve speeds close to 2 flops a clock cycle.
> > =
> =
> The 'problem' in this benchmark is that the number of iterations of the i=
nner
> loop isn't fixed, but can vary between 1 and 50. If you pair two doubles,=
the
> change you can break the loop for all elements of the vector before itera=
tion
> 50 is bigger than when you combine 3000 elements.
You are right. How about doing it in blocks of 8x8 pixels? The =
high iteration loops are concentrated close to the borders of
the set, so for most blocks the iteration can then be ended early.
Dani=EBl
More information about the fpc-devel
mailing list