[fpc-pascal] Help translate .SAVENV x86-64 assembly directive

luiz americo pereira camara luizmed at oi.com.br
Mon Oct 1 20:03:32 CEST 2012


I'm translating porting a Delphi component that uses x86-64 assembly.

So far, most of the asm code compiles and works fine.

The only thing that i could not translate is the .SAVENV directive. See
http://blogs.embarcadero.com/abauer/2011/10/10/38940 and

Seems that is necessary to save and restore the register with a specific

I would appreciate any help

The procedure is below.

The component is VirtualTreeView and full source can be found at

procedure AlphaBlendLineMaster(Source, Destination: Pointer; Count:
Integer; ConstantAlpha, Bias: Integer);

// Blends a line of Count pixels from Source to Destination using the
source pixel and a constant alpha value.
// The layout of a pixel must be BGRA.
// ConstantAlpha must be in the range 0..255.
// Bias is an additional value which gets added to every component and must
be in the range -128..127


{$ifdef CPU64}
// RCX contains Source
// RDX contains Destination
// R8D contains Count
// R9D contains ConstantAlpha
// Bias is on the stack

        .SAVENV XMM6  //todo see how implement in fpc

        // Load XMM3 with the constant alpha value (replicate it for every
        // Expand it to word size.
        MOVD        XMM3, R9D    // ConstantAlpha
        PUNPCKLWD   XMM3, XMM3
        PUNPCKLDQ   XMM3, XMM3

        // Load XMM5 with the bias value.
        MOV         R10D, [Bias]
        MOVD        XMM5, R10D
        PUNPCKLWD   XMM5, XMM5
        PUNPCKLDQ   XMM5, XMM5

        // Load XMM4 with 128 to allow for saturated biasing.
        MOV         R10D, 128
        MOVD        XMM4, R10D
        PUNPCKLWD   XMM4, XMM4
        PUNPCKLDQ   XMM4, XMM4

@1:     // The pixel loop calculates an entire pixel in one run.
        // Note: The pixel byte values are expanded into the higher bytes
of a word due
        //       to the way unpacking works. We compensate for this with an
extra shift.
        MOVD        XMM1, DWORD PTR [RCX]   // data is unaligned
        MOVD        XMM2, DWORD PTR [RDX]   // data is unaligned
        PXOR        XMM0, XMM0    // clear source pixel register for
        PUNPCKLBW   XMM0, XMM1{[RCX]}     // unpack source pixel byte
values into words
        PSRLW       XMM0, 8       // move higher bytes to lower bytes
        PXOR        XMM1, XMM1    // clear target pixel register for
        PUNPCKLBW   XMM1, XMM2{[RCX]}     // unpack target pixel byte
values into words
        MOVQ        XMM2, XMM1    // make a copy of the shifted values, we
need them again
        PSRLW       XMM1, 8       // move higher bytes to lower bytes

        // Load XMM6 with the source alpha value (replicate it for every
        // Expand it to word size.
        MOVQ        XMM6, XMM0
        PUNPCKHWD   XMM6, XMM6
        PUNPCKHDQ   XMM6, XMM6
        PMULLW      XMM6, XMM3    // source alpha * master alpha
        PSRLW       XMM6, 8       // divide by 256

        // calculation is: target = (alpha * master alpha * (source -
target) + 256 * target) / 256
        PSUBW       XMM0, XMM1    // source - target
        PMULLW      XMM0, XMM6    // alpha * (source - target)
        PADDW       XMM0, XMM2    // add target (in shifted form)
        PSRLW       XMM0, 8       // divide by 256

        // Bias is accounted for by conversion of range 0..255 to -128..127,
        // doing a saturated add and convert back to 0..255.
        PSUBW       XMM0, XMM4
        PADDSW      XMM0, XMM5
        PADDW       XMM0, XMM4
        PACKUSWB    XMM0, XMM0    // convert words to bytes with saturation
        MOVD        DWORD PTR [RDX], XMM0   // store the result
        ADD         RCX, 4
        ADD         RDX, 4
        DEC         R8D
        JNZ         @1
