[fpc-devel] Difficulty in specifying record alignment... and more compiler optimisation shenanigans!
Florian Klämpfl
florian at freepascal.org
Tue Oct 22 23:03:46 CEST 2019
Am 22.10.19 um 05:01 schrieb J. Gareth Moreton:
>
> Bigger challenges would be optimising the modulus of a complex number:
>
> function cmod (z : complex): real; vectorcall;
> { module : r = |z| }
> begin
> with z do
> cmod := sqrt((re * re) + (im * im));
> end;
>
> A perfect compiler with permission to use SSE3 (for haddpd) should
> generate the following (note that no stack frame is required):
>
> mulpd %xmm0, %xmm0 { Calculates "re * re" and "im * im" simultaneously }
> haddpd %xmm0, %xmm0 { Adds the above multiplications together
> (horizontal add) }
> sqrtsd %xmm0
> ret
>
> Currently, with vectorcall, the routine compiles into this:
>
> leaq -24(%rsp),%rsp
> movdqa %xmm0,(%rsp)
> movq %rsp,%rax
> movsd (%rax),%xmm1
> mulsd %xmm1,%xmm1
> movsd 8(%rax),%xmm0
> mulsd %xmm0,%xmm0
> addsd %xmm1,%xmm0
> sqrtsd %xmm0,%xmm0
> leaq 24(%rsp),%rsp
> ret
>
> And without vectorcall (or an unaligned record type):
>
> leaq -24(%rsp),%rsp
> movq %rcx,%rax
> movq (%rax),%rdx
> movq %rdx,(%rsp)
> movq 8(%rax),%rax
> movq %rax,8(%rsp)
> movq %rsp,%rax
> movsd (%rax),%xmm1
> mulsd %xmm1,%xmm1
> movsd 8(%rax),%xmm0
> mulsd %xmm0,%xmm0
> addsd %xmm1,%xmm0
> sqrtsd %xmm0,%xmm0
> leaq 24(%rsp),%rsp
> ret
>
With a few additions (the git patch is less than 500 lines) in the
compiler I get (it is not ready for committing yet):
.section .text.n_p$program_$$_cmod$complex$$real,"ax"
.balign 16,0x90
.globl P$PROGRAM_$$_CMOD$COMPLEX$$REAL
.type P$PROGRAM_$$_CMOD$COMPLEX$$REAL, at function
P$PROGRAM_$$_CMOD$COMPLEX$$REAL:
.Lc2:
# Var $result located in register xmm0
# Var z located in register xmm0
# [test.pp]
# [20] begin
# [22] cmod := sqrt((re * re) + (im * im));
mulsd %xmm0,%xmm0
mulsd %xmm1,%xmm1
addsd %xmm0,%xmm1
sqrtsd %xmm1,%xmm0
# Var $result located in register xmm0
.Lc3:
# [23] end;
ret
.Lc1:
.Le0:
.size P$PROGRAM_$$_CMOD$COMPLEX$$REAL, .Le0 -
P$PROGRAM_$$_CMOD$COMPLEX$$REAL
It mainly keeps records in mm registers. I am not sure about the right
approach yet. But to allocate one register to each field of suitable
records seems to be a reasonable approach.
More information about the fpc-devel
mailing list