[fpc-pascal] Efficiency of generated code [x86_64]

Jonas Maebe jonas.maebe at elis.ugent.be
Sat Jun 25 10:38:31 CEST 2011

On 24 Jun 2011, at 19:16, Peter wrote:

> # [7] For X := 0  to 10000000 do
>    movl    $0,%eax
>    decl    %eax
>    .balign 4,0x90
> .Lj7:
>    incl    %eax
> # [9] A := A + X;
>    cvtsi2sdl    %eax,%xmm2
>    addsd    %xmm0,%xmm2
>    movsd    %xmm2,%xmm0
> # [10] A := A * B;
>    movsd    %xmm0,%xmm2
>    mulsd    %xmm1,%xmm2
>    movsd    %xmm2,%xmm0
>    cmpl    $10000000,%eax
>    jl    .Lj7
> # [14] end;
>    movsd    %xmm0,%xmm0
>    addq    $24,%rsp
>    ret
> I am wondering what is the point of all the xmm2 stuff

Variable A is a register variable. While evaluating "A + X" and "A * B", the code generator does not know that the final result will be stored back into A (nor that "A" won't be used again before the final result is written back), so it must make sure that A is not destroyed while performing these calculations.

Such inefficiencies are usually solved with integer code on i386 (and to some extent on PowerPC) using the peephole optimizer. There's no peephole optimizer for x86-64 though (and none for sse code, not even on i386). Most of those register transfers are also however pretty much free (processors rename registers internally all the time, even if you don't use explicit register moves in your code), except that they increase the icache pressure somewhat.

> Also puzzled by the final
> movsd %xmm0,%xmm0
> What does this do?

It probably means that the register size of one xmm0 is not the same as that of the other inside the compiler (e.g., one may specifically represent a 64 bit double while the other may represent "the entire xmm register"), and the compiler will only remove transfers between registers of exactly the same size (since otherwise some conversion may be going on; this optimization is performed by generic code that has no clue about the specific meaning of "movsd"). It means that the size of ether xmm0 register should be specified more precise somewhere in the compiler.

> I would really like to be able to generate optimal (ie minimal) xmm code from Pascal without dropping into assembler. Are there any other compiler switches that would help?



More information about the fpc-pascal mailing list