[fpc-devel] using sse2 packed doubles

Daniël Mantione daniel.mantione at freepascal.org
Sun Oct 8 10:41:21 CEST 2006



Op Sat, 7 Oct 2006, schreef Florian Klaempfl:

> Vincent Snijders schrieb:
> > Dani=EBl Mantione wrote:
> > > =

> > > Op Fri, 6 Oct 2006, schreef Micha Nelissen:
> > > =

> > > =

> > > > Vincent Snijders wrote:
> > > > =

> > > You could also start an assembler implementation of the matrix unit.
> > > I suppose using it is allowed, and a Tvector2_double looks a lot like
> > > such a double2.
> > =

> > Unless the compiler somehow helps, inlining the assembler implementation
> > won't work and then the speedup might be lost again.
> =

> I started to add vector pascal like support, currently only i386/x86_64 a=
re
> supported (no generic support). The whole (currently implemented)
> functionality is demonstrated by the following example. Please give some
> feedback if it allows benchmark speedups.

To get a large speedup, I think you should instead of making pairs of =

doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000 =

pixels wide, so, make an array of 3000 doubles, and do the operation with =

arrays. With proper compiler optimization, it should be possible to =

achieve speeds close to 2 flops a clock cycle.

Dani=EBl


More information about the fpc-devel mailing list