# [fpc-pascal] Efficiency of generated code [x86_64]

Peter peter at pblackman.plus.com
Fri Jun 24 19:16:29 CEST 2011

```Hi,

I'm puzzled by some of the code generated for x64. Came across this
earlier post;
http://www.hu.freepascal.org/lists/fpc-pascal/2005-March/008175.html

Compiling the simple example with the loop in a function I get much
leaner & meaner than the [i386] assembler in the original post, but I
had to use O3 and that separate function to get full use of xmm
registers instead of the stack.

Program tttt;

Function loop (A,B : double) : double;
Var X : LongInt;
Begin
For X := 0  to 10000000 do
Begin
A := A + X;
A := A * B;
End;
loop := A;
End;

Var A,B : double;
Begin
A := 0;
B := 0.9;
loop (A,B);
WRITELN (loop (A,B):0:9);
end.

Looking at the assembler loop code

# Var A located in register xmm0
# Var B located in register xmm1
# Var \$result located in register xmm0
# Var X located in register eax    //   AND xmm2 !

# [7] For X := 0  to 10000000 do
movl    \$0,%eax
decl    %eax
.balign 4,0x90
.Lj7:
incl    %eax
# [9] A := A + X;
cvtsi2sdl    %eax,%xmm2
movsd    %xmm2,%xmm0
# [10] A := A * B;
movsd    %xmm0,%xmm2
mulsd    %xmm1,%xmm2
movsd    %xmm2,%xmm0
cmpl    \$10000000,%eax
jl    .Lj7
# [14] end;
movsd    %xmm0,%xmm0
ret

I am wondering what is the point of all the xmm2 stuff, apart from the
initial transfer of X from %eax?  I can't see the point of it. Not set
any debugging options.  What is wrong with the following?

# [9] A := A + X;
cvtsi2sdl  %eax,%xmm2
# [10] A := A * B;
mulsd  %xmm1,%xmm0
cmpl    \$10000000,%eax
jl    .Lj7
# [14] end;

Also puzzled by the final
movsd %xmm0,%xmm0
What does this do?

I would really like to be able to generate optimal (ie minimal) xmm code
from Pascal without dropping into assembler. Are there any other
compiler switches that would help?

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20110624/5e15c61f/attachment.html>
```