[fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!

Florian Klämpfl florian at freepascal.org
Tue Oct 22 23:03:46 CEST 2019


Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:

> 
> Bigger challenges would be optimising the modulus of a complex number:
> 
>    function cmod (z : complex): real; vectorcall;
>      { module : r = |z| }
>      begin
>         with z do
>           cmod := sqrt((re * re) + (im * im));
>      end;
> 
> A perfect compiler with permission to use SSE3 (for haddpd) should 
> generate the following (note that no stack frame is required):
> 
> mulpd    %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously }
> haddpd    %xmm0, %xmm0 { Adds the above multiplications together 
> (horizontal add) }
> sqrtsd    %xmm0
> ret
> 
> Currently, with vectorcall, the routine compiles into this:
> 
> leaq    -24(%rsp),%rsp
> movdqa    %xmm0,(%rsp)
> movq    %rsp,%rax
> movsd    (%rax),%xmm1
> mulsd    %xmm1,%xmm1
> movsd    8(%rax),%xmm0
> mulsd    %xmm0,%xmm0
> addsd    %xmm1,%xmm0
> sqrtsd    %xmm0,%xmm0
> leaq    24(%rsp),%rsp
> ret
> 
> And without vectorcall (or an unaligned record type):
> 
> leaq    -24(%rsp),%rsp
> movq    %rcx,%rax
> movq    (%rax),%rdx
> movq    %rdx,(%rsp)
> movq    8(%rax),%rax
> movq    %rax,8(%rsp)
> movq    %rsp,%rax
> movsd    (%rax),%xmm1
> mulsd    %xmm1,%xmm1
> movsd    8(%rax),%xmm0
> mulsd    %xmm0,%xmm0
> addsd    %xmm1,%xmm0
> sqrtsd    %xmm0,%xmm0
> leaq    24(%rsp),%rsp
> ret
> 

With a few additions (the git patch is less than 500 lines) in the 
compiler I get (it is not ready for committing yet):

.section .text.n_p$program_$$_cmod$complex$$real,"ax"
	.balign 16,0x90
.globl	P$PROGRAM_$$_CMOD$COMPLEX$$REAL
	.type	P$PROGRAM_$$_CMOD$COMPLEX$$REAL, at function
P$PROGRAM_$$_CMOD$COMPLEX$$REAL:
.Lc2:
# Var $result located in register xmm0
# Var z located in register xmm0
# [test.pp]
# [20] begin
# [22] cmod := sqrt((re * re) + (im * im));
	mulsd	%xmm0,%xmm0
	mulsd	%xmm1,%xmm1
	addsd	%xmm0,%xmm1
	sqrtsd	%xmm1,%xmm0
# Var $result located in register xmm0
.Lc3:
# [23] end;
	ret
.Lc1:
.Le0:
	.size	P$PROGRAM_$$_CMOD$COMPLEX$$REAL, .Le0 - 
P$PROGRAM_$$_CMOD$COMPLEX$$REAL

It mainly keeps records in mm registers. I am not sure about the right 
approach yet. But to allocate one register to each field of suitable 
records seems to be a reasonable approach.


More information about the fpc-devel mailing list