[fpc-pascal] Floating Point Performance on Intel

Mon Mar 28 18:12:15 CEST 2005

Raw Magick DOT COM wrote:

> Hi All,
> 
> My name is Peter Dove, I am new to FPC and Lazarus. I come from a
> mainly Delphi background but I use C, C++ and assembler as needed to
> improve performance on the imagining app we are working on.
> 
> Like Delphi, FPC has a poor floating point optimisation situation in
> comparison to similar compiles in C. For instance the following code
> in Pascal
> 
>      A := 0;
>      B := 0.9;
>      For X := 0 to 10000000 do
>      begin
>           A := A + X;
>           A := A * B;
>      end;
> 
> Takes some 220ms to perform. The major problem with the performance is
> the poor loop optimisation and register usage, also with wasted push
> and pulls from memory. Below is the result from the assembler output
> from FPC - all optimisations were enabled..

The problem with such optimizations is that usually the compiler knows
too little about a program so such optimizations apply to seldom and
aren't worse the affort to be implemented. E.g. your assembler assumes
that a:=a*b can't throw an exception, but it can and the compiler isn't
allowed to assume that it doesn't.

> 
> # Var A located at ebp-4
> # Var B located at ebp-8
> # Var X located at ebp-12
> 
> //A + B are set up before here - its the loop thats interrsting
> 
> # [44] For X := 0 to 10000000 do
>         movl    $0,-12(%ebp)
>         decl    -12(%ebp)
>         .balign 4
> .L31:
>         incl    -12(%ebp)
> # [46] A := A + X;
>         flds    -4(%ebp)
>         fildl   -12(%ebp)
>         faddp   %st,%st(1)
>         fstps   -4(%ebp)
> # [47] A := A * B;
>         flds    -8(%ebp)
>         fmuls   -4(%ebp)
>         fstps   -4(%ebp)
>         cmpl    $10000000,-12(%ebp)
>         jl      .L31
> 
> My comments on this are that
> 
> a) The loop counter is basically a comparison against a memory area =
> slow

Well, you need to write the counter to the memory as well, so this
shouldn't count much.

> b) There are some unnessary loads from memory occuring = slow
> 
> The above code takes about 210ms to perform on my machine. Below is my
> own assembler which takes about 100ms ( apologies it is in a slightly
> different format )
> 
> asm
>    mov eax, 0; //Set up loop counter
>    @StartOfLoop:
>    mov dword ptr[x], eax; // Move its value into X ( on stack )
>    FILD dword ptr[x]; //Load into floating point
>    FADD dword ptr[A]; // Add A ( on Stack ) to it
>    FMUL dword ptr[B]; //Multiply by B ( on Stack )
>    FSTP dword ptr[A]; // Pop into A
>    add eax, 1; //Inc loop counter
>    cmp eax, 10000000; // Test Jump condition
>    jl @StartOfLoop;
> end;
> 
> My question is, what needs to be done to the compiler to make it

The compiler needs a proper lifetime analysis of expressions.

> optimise as well as C compilers, 

See above, this is often not possible in pascal.

> or perhaps I am missing some compiler
> switches.