# [fpc-devel] Vectorization

J. Gareth Moreton gareth at moreton-family.com
Sun Dec 10 02:29:26 CET 2017

Hi everyone,

Since I'm masochistic in my desire to understand and improve the Free Pascal Compiler, I would like to add
some vectorisation support in its optimisation cycle, since that is one thing that many other compilers
attempt to do these days.  But before I begin, does FPC support any kind of vectorisation already?  If it
does I haven't been able to find it yet, and I don't want to end up reinventing the wheel.

I recall things, for example, where the following is not optimised even if the compiler is set to use SSE:

type
TVector4f = packed record
X, Y, Z, W: Single;
end;

begin
Result.X := A.X + B.X;
Result.Y := A.Y + B.Y;
Result.Z ;= A.Z + B.Z;
Result.W := A.W + B.W;
end;

The resultant assembler code yields an individual "MOVSS" and arithmetic for each element rather than
combining the reads and writes into a MOVUPS instruction and reducing the number of arithmetic instructions
by a factor of 4.  For clarity, this is the assembler produced with '-CfSSE64':

.balign 16,0x90
.Lc1:
leaq	-56(%rsp),%rsp
.Lc3:
.seh_stackalloc 56
.seh_endprologue
movq	%rcx,%rax
movq	%rdx,(%rsp)
movq	%r8,8(%rsp)
movq	(%rsp),%rdx
movq	(%rdx),%rcx
movq	%rcx,16(%rsp)
movq	8(%rdx),%rdx
movq	%rdx,24(%rsp)
movq	8(%rsp),%rdx
movq	(%rdx),%rcx
movq	%rcx,32(%rsp)
movq	8(%rdx),%rdx
movq	%rdx,40(%rsp)
movss	16(%rsp),%xmm0
movss	%xmm0,(%rax)
movss	20(%rsp),%xmm0
movss	%xmm0,4(%rax)
movss	24(%rsp),%xmm0
movss	%xmm0,8(%rax)
movss	28(%rsp),%xmm0
movss	%xmm0,12(%rax)
leaq	56(%rsp),%rsp
ret
.seh_endproc
.Lc2:

A good vectoriser (for lack of a better name!) would be able to optimise the 12 movss/addss routines to just
"movups 16(%rsp),%xmm0  addps 32(%rsp),%xmm0  movups %xmm0,(%rax)" - since the stack is aligned to a 16-byte
boundary, it can swap out the first movups to a movaps too.  Not sure what to do regarding moving everything
to the stack first though.

I'm sure it's a mammoth task, but I would like to start somewhere with it - however, are there any design
plans that I should be adhering to so I don't end up designing something that is disliked?

Kit