[fpc-devel] FPC Performance: MOVAPD vs MOVSD

Graeme Geldenhuys graemeg.lists at gmail.com
Wed Oct 19 15:45:00 CEST 2011


Hi,

I was reading this article about Delphi XE2 floating point performance.


http://delphitools.info/2011/09/02/first-look-at-xe2-floating-point-performance/

Not that I understand much of the assembler generated, but what I did
notice is that Delphi XE2 64-bit uses the MOVAPD function (introduced in
SSE2 [1]), but even if I specify -O3 -CfSSE3 with 64-bit FPC, FPC only
uses the MOVSD (introduced in 386 [2]).

So is there place for optimizing FPC a bit more? Reducing the number of
instructions and using faster / newer assembler calls?


What does the compiler generate for the two following lines?

  x := x0 * x0 - y0 * y0 + p;
  y := 2 * x0 * y0 + q;


Delphi XE2
-----------
FMandelTest.pas.193: x := x0 * x0 - y0 * y0 + p;
00000000005A1452 660F28C4         movapd xmm0,xmm4
00000000005A1456 F20F59C4         mulsd xmm0,xmm4
00000000005A145A 660F28CD         movapd xmm1,xmm5
00000000005A145E F20F59CD         mulsd xmm1,xmm5
00000000005A1462 F20F5CC1         subsd xmm0,xmm1
00000000005A1466 F20F58C2         addsd xmm0,xmm2
FMandelTest.pas.194: y := 2 * x0 * y0 + q;
00000000005A146A 660F28CC         movapd xmm1,xmm4
00000000005A146E F20F590DA2000000 mulsd xmm1,qword ptr [rel $000000a2]
00000000005A1476 F20F59CD         mulsd xmm1,xmm5
00000000005A147A F20F58CB         addsd xmm1,xmm3



64-bit FPC 2.5.1
-----------------
# Var x located in register xmm0
# Var x0 located in register xmm2
# Var y located in register xmm0
# Var y0 located in register xmm3
# Var p located in register xmm4
# Var q located in register xmm8
......
.Ll3:
# [17] x := x0 * x0 - y0 * y0 + p;
	movsd	%xmm0,%xmm5
	mulsd	%xmm0,%xmm5
.Ll4:
	movsd	%xmm3,%xmm1
.Ll5:
	movsd	%xmm1,%xmm0
	mulsd	%xmm1,%xmm0
	subsd	%xmm0,%xmm5
	addsd	%xmm4,%xmm5
	movsd	%xmm5,-40(%rbp)
.Ll6:
# [18] y := 2 * x0 * y0 + q;
	movsd	_$FPU_TEST$_Ld1,%xmm0
	mulsd	%xmm2,%xmm0
	mulsd	%xmm3,%xmm0
	addsd	%xmm8,%xmm0
	movsd	%xmm0,-32(%rbp)





References:
===========
1)
http://en.wikipedia.org/wiki/MOVAPD
2)
http://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_80386




The full Delphi source code for the Mandelbrot test can be downloaded
from:

   http://delphitools.info/wp-content/uploads/2011/03/MandelTest.zip



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/




More information about the fpc-devel mailing list