[fpc-pascal] Efficiency of generated code [x86_64]
Peter
peter at pblackman.plus.com
Fri Jun 24 19:16:29 CEST 2011
Hi,
I'm puzzled by some of the code generated for x64. Came across this
earlier post;
http://www.hu.freepascal.org/lists/fpc-pascal/2005-March/008175.html
Compiling the simple example with the loop in a function I get much
leaner & meaner than the [i386] assembler in the original post, but I
had to use O3 and that separate function to get full use of xmm
registers instead of the stack.
Program tttt;
Function loop (A,B : double) : double;
Var X : LongInt;
Begin
For X := 0 to 10000000 do
Begin
A := A + X;
A := A * B;
End;
loop := A;
End;
Var A,B : double;
Begin
A := 0;
B := 0.9;
loop (A,B);
WRITELN (loop (A,B):0:9);
end.
Looking at the assembler loop code
# Var A located in register xmm0
# Var B located in register xmm1
# Var $result located in register xmm0
# Var X located in register eax // AND xmm2 !
# [7] For X := 0 to 10000000 do
movl $0,%eax
decl %eax
.balign 4,0x90
.Lj7:
incl %eax
# [9] A := A + X;
cvtsi2sdl %eax,%xmm2
addsd %xmm0,%xmm2
movsd %xmm2,%xmm0
# [10] A := A * B;
movsd %xmm0,%xmm2
mulsd %xmm1,%xmm2
movsd %xmm2,%xmm0
cmpl $10000000,%eax
jl .Lj7
# [14] end;
movsd %xmm0,%xmm0
addq $24,%rsp
ret
I am wondering what is the point of all the xmm2 stuff, apart from the
initial transfer of X from %eax? I can't see the point of it. Not set
any debugging options. What is wrong with the following?
# [9] A := A + X;
cvtsi2sdl %eax,%xmm2
addsd %xmm2,%xmm0
# [10] A := A * B;
mulsd %xmm1,%xmm0
cmpl $10000000,%eax
jl .Lj7
# [14] end;
Also puzzled by the final
movsd %xmm0,%xmm0
What does this do?
I would really like to be able to generate optimal (ie minimal) xmm code
from Pascal without dropping into assembler. Are there any other
compiler switches that would help?
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20110624/5e15c61f/attachment.html>
More information about the fpc-pascal
mailing list