[fpc-pascal] Floating Point Performance on Intel
Florian Klaempfl
florian at freepascal.org
Mon Mar 28 18:12:15 CEST 2005
Raw Magick DOT COM wrote:
> Hi All,
>
> My name is Peter Dove, I am new to FPC and Lazarus. I come from a
> mainly Delphi background but I use C, C++ and assembler as needed to
> improve performance on the imagining app we are working on.
>
> Like Delphi, FPC has a poor floating point optimisation situation in
> comparison to similar compiles in C. For instance the following code
> in Pascal
>
> A := 0;
> B := 0.9;
> For X := 0 to 10000000 do
> begin
> A := A + X;
> A := A * B;
> end;
>
> Takes some 220ms to perform. The major problem with the performance is
> the poor loop optimisation and register usage, also with wasted push
> and pulls from memory. Below is the result from the assembler output
> from FPC - all optimisations were enabled..
The problem with such optimizations is that usually the compiler knows
too little about a program so such optimizations apply to seldom and
aren't worse the affort to be implemented. E.g. your assembler assumes
that a:=a*b can't throw an exception, but it can and the compiler isn't
allowed to assume that it doesn't.
>
> # Var A located at ebp-4
> # Var B located at ebp-8
> # Var X located at ebp-12
>
> //A + B are set up before here - its the loop thats interrsting
>
> # [44] For X := 0 to 10000000 do
> movl $0,-12(%ebp)
> decl -12(%ebp)
> .balign 4
> .L31:
> incl -12(%ebp)
> # [46] A := A + X;
> flds -4(%ebp)
> fildl -12(%ebp)
> faddp %st,%st(1)
> fstps -4(%ebp)
> # [47] A := A * B;
> flds -8(%ebp)
> fmuls -4(%ebp)
> fstps -4(%ebp)
> cmpl $10000000,-12(%ebp)
> jl .L31
>
> My comments on this are that
>
> a) The loop counter is basically a comparison against a memory area =
> slow
Well, you need to write the counter to the memory as well, so this
shouldn't count much.
> b) There are some unnessary loads from memory occuring = slow
>
> The above code takes about 210ms to perform on my machine. Below is my
> own assembler which takes about 100ms ( apologies it is in a slightly
> different format )
>
> asm
> mov eax, 0; //Set up loop counter
> @StartOfLoop:
> mov dword ptr[x], eax; // Move its value into X ( on stack )
> FILD dword ptr[x]; //Load into floating point
> FADD dword ptr[A]; // Add A ( on Stack ) to it
> FMUL dword ptr[B]; //Multiply by B ( on Stack )
> FSTP dword ptr[A]; // Pop into A
> add eax, 1; //Inc loop counter
> cmp eax, 10000000; // Test Jump condition
> jl @StartOfLoop;
> end;
>
> My question is, what needs to be done to the compiler to make it
The compiler needs a proper lifetime analysis of expressions.
> optimise as well as C compilers,
See above, this is often not possible in pascal.
> or perhaps I am missing some compiler
> switches.
More information about the fpc-pascal
mailing list