[fpc-devel] Same 64bit assembly code compiles under windows but not in linux (fpc 260)

Wed Oct 3 09:50:15 CEST 2012

Hi,
03.10.2012 5:29, luiz americo pereira camara:
[...]
> The complete procedure:
>
> {$ASMMODE INTEL}
>
> procedure AlphaBlendLineConstant(Source, Destination: Pointer; Count:
> Integer; ConstantAlpha, Bias: Integer);
>
> asm
>
> {$ifdef CPU64}
> // RCX contains Source
> // RDX contains Destination
> // R8D contains Count
> // R9D contains ConstantAlpha
> // Bias is on the stack

The procedure declaration above is not specific enough for the assembler 
block listed below, because
- the size of "integer" generally may vary;
- calling convention generally may vary;
I'd suggest to
- replace "integer" by "longint" or "word", according to what assembler 
code expects;
- specify some calling convention according to what assembler code 
expects (IIRC in this case it is cdecl but I'm not completely sure).

HTH
Nikolai

>
>          //.NOFRAME
>
>          // Load XMM3 with the constant alpha value (replicate it for
> every component).
>          // Expand it to word size.
>          MOVD        XMM3, R9D  // ConstantAlpha
>          PUNPCKLWD   XMM3, XMM3
>          PUNPCKLDQ   XMM3, XMM3
>
>          // Load XMM5 with the bias value.
>          MOVD        XMM5, [Bias]
>          PUNPCKLWD   XMM5, XMM5
>          PUNPCKLDQ   XMM5, XMM5
>
>          // Load XMM4 with 128 to allow for saturated biasing.
>          MOV         R10D, 128
>          MOVD        XMM4, R10D
>          PUNPCKLWD   XMM4, XMM4
>          PUNPCKLDQ   XMM4, XMM4
>
> @1:     // The pixel loop calculates an entire pixel in one run.
>          // Note: The pixel byte values are expanded into the higher
> bytes of a word due
>          //       to the way unpacking works. We compensate for this
> with an extra shift.
>          MOVD        XMM1, DWORD PTR [RCX]   // data is unaligned
>          MOVD        XMM2, DWORD PTR [RDX]   // data is unaligned
>          PXOR        XMM0, XMM0    // clear source pixel register for
> unpacking
>          PUNPCKLBW   XMM0, XMM1{[RCX]}    // unpack source pixel byte
> values into words
>          PSRLW       XMM0, 8       // move higher bytes to lower bytes
>          PXOR        XMM1, XMM1    // clear target pixel register for
> unpacking
>          PUNPCKLBW   XMM1, XMM2{[RDX]}    // unpack target pixel byte
> values into words
>          MOVQ        XMM2, XMM1    // make a copy of the shifted values,
> we need them again
>          PSRLW       XMM1, 8       // move higher bytes to lower bytes
>
>          // calculation is: target = (alpha * (source - target) + 256 *
> target) / 256
>          PSUBW       XMM0, XMM1    // source - target
>          PMULLW      XMM0, XMM3    // alpha * (source - target)
>          PADDW       XMM0, XMM2    // add target (in shifted form)
>          PSRLW       XMM0, 8       // divide by 256
>
>          // Bias is accounted for by conversion of range 0..255 to
> -128..127,
>          // doing a saturated add and convert back to 0..255.
>          PSUBW     XMM0, XMM4
>          PADDSW    XMM0, XMM5
>          PADDW     XMM0, XMM4
>          PACKUSWB  XMM0, XMM0      // convert words to bytes with saturation
>          MOVD      DWORD PTR [RDX], XMM0     // store the result
> @3:
>          ADD       RCX, 4
>          ADD       RDX, 4
>          DEC       R8D
>          JNZ       @1
>
> {$endif}
>
> end;
>
>
>
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel