[fpc-devel] FPC Performance: MOVAPD vs MOVSD
Graeme Geldenhuys
graemeg.lists at gmail.com
Wed Oct 19 15:45:00 CEST 2011
Hi,
I was reading this article about Delphi XE2 floating point performance.
http://delphitools.info/2011/09/02/first-look-at-xe2-floating-point-performance/
Not that I understand much of the assembler generated, but what I did
notice is that Delphi XE2 64-bit uses the MOVAPD function (introduced in
SSE2 [1]), but even if I specify -O3 -CfSSE3 with 64-bit FPC, FPC only
uses the MOVSD (introduced in 386 [2]).
So is there place for optimizing FPC a bit more? Reducing the number of
instructions and using faster / newer assembler calls?
What does the compiler generate for the two following lines?
x := x0 * x0 - y0 * y0 + p;
y := 2 * x0 * y0 + q;
Delphi XE2
-----------
FMandelTest.pas.193: x := x0 * x0 - y0 * y0 + p;
00000000005A1452 660F28C4 movapd xmm0,xmm4
00000000005A1456 F20F59C4 mulsd xmm0,xmm4
00000000005A145A 660F28CD movapd xmm1,xmm5
00000000005A145E F20F59CD mulsd xmm1,xmm5
00000000005A1462 F20F5CC1 subsd xmm0,xmm1
00000000005A1466 F20F58C2 addsd xmm0,xmm2
FMandelTest.pas.194: y := 2 * x0 * y0 + q;
00000000005A146A 660F28CC movapd xmm1,xmm4
00000000005A146E F20F590DA2000000 mulsd xmm1,qword ptr [rel $000000a2]
00000000005A1476 F20F59CD mulsd xmm1,xmm5
00000000005A147A F20F58CB addsd xmm1,xmm3
64-bit FPC 2.5.1
-----------------
# Var x located in register xmm0
# Var x0 located in register xmm2
# Var y located in register xmm0
# Var y0 located in register xmm3
# Var p located in register xmm4
# Var q located in register xmm8
......
.Ll3:
# [17] x := x0 * x0 - y0 * y0 + p;
movsd %xmm0,%xmm5
mulsd %xmm0,%xmm5
.Ll4:
movsd %xmm3,%xmm1
.Ll5:
movsd %xmm1,%xmm0
mulsd %xmm1,%xmm0
subsd %xmm0,%xmm5
addsd %xmm4,%xmm5
movsd %xmm5,-40(%rbp)
.Ll6:
# [18] y := 2 * x0 * y0 + q;
movsd _$FPU_TEST$_Ld1,%xmm0
mulsd %xmm2,%xmm0
mulsd %xmm3,%xmm0
addsd %xmm8,%xmm0
movsd %xmm0,-32(%rbp)
References:
===========
1)
http://en.wikipedia.org/wiki/MOVAPD
2)
http://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_80386
The full Delphi source code for the Mandelbrot test can be downloaded
from:
http://delphitools.info/wp-content/uploads/2011/03/MandelTest.zip
Regards,
- Graeme -
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/
More information about the fpc-devel
mailing list