[fpc-devel] Same 64bit assembly code compiles under windows but not in linux (fpc 260)
Nikolai Zhubr
n-a-zhubr at yandex.ru
Wed Oct 3 09:50:15 CEST 2012
Hi,
03.10.2012 5:29, luiz americo pereira camara:
[...]
> The complete procedure:
>
> {$ASMMODE INTEL}
>
> procedure AlphaBlendLineConstant(Source, Destination: Pointer; Count:
> Integer; ConstantAlpha, Bias: Integer);
>
> asm
>
> {$ifdef CPU64}
> // RCX contains Source
> // RDX contains Destination
> // R8D contains Count
> // R9D contains ConstantAlpha
> // Bias is on the stack
The procedure declaration above is not specific enough for the assembler
block listed below, because
- the size of "integer" generally may vary;
- calling convention generally may vary;
I'd suggest to
- replace "integer" by "longint" or "word", according to what assembler
code expects;
- specify some calling convention according to what assembler code
expects (IIRC in this case it is cdecl but I'm not completely sure).
HTH
Nikolai
>
> //.NOFRAME
>
> // Load XMM3 with the constant alpha value (replicate it for
> every component).
> // Expand it to word size.
> MOVD XMM3, R9D // ConstantAlpha
> PUNPCKLWD XMM3, XMM3
> PUNPCKLDQ XMM3, XMM3
>
> // Load XMM5 with the bias value.
> MOVD XMM5, [Bias]
> PUNPCKLWD XMM5, XMM5
> PUNPCKLDQ XMM5, XMM5
>
> // Load XMM4 with 128 to allow for saturated biasing.
> MOV R10D, 128
> MOVD XMM4, R10D
> PUNPCKLWD XMM4, XMM4
> PUNPCKLDQ XMM4, XMM4
>
> @1: // The pixel loop calculates an entire pixel in one run.
> // Note: The pixel byte values are expanded into the higher
> bytes of a word due
> // to the way unpacking works. We compensate for this
> with an extra shift.
> MOVD XMM1, DWORD PTR [RCX] // data is unaligned
> MOVD XMM2, DWORD PTR [RDX] // data is unaligned
> PXOR XMM0, XMM0 // clear source pixel register for
> unpacking
> PUNPCKLBW XMM0, XMM1{[RCX]} // unpack source pixel byte
> values into words
> PSRLW XMM0, 8 // move higher bytes to lower bytes
> PXOR XMM1, XMM1 // clear target pixel register for
> unpacking
> PUNPCKLBW XMM1, XMM2{[RDX]} // unpack target pixel byte
> values into words
> MOVQ XMM2, XMM1 // make a copy of the shifted values,
> we need them again
> PSRLW XMM1, 8 // move higher bytes to lower bytes
>
> // calculation is: target = (alpha * (source - target) + 256 *
> target) / 256
> PSUBW XMM0, XMM1 // source - target
> PMULLW XMM0, XMM3 // alpha * (source - target)
> PADDW XMM0, XMM2 // add target (in shifted form)
> PSRLW XMM0, 8 // divide by 256
>
> // Bias is accounted for by conversion of range 0..255 to
> -128..127,
> // doing a saturated add and convert back to 0..255.
> PSUBW XMM0, XMM4
> PADDSW XMM0, XMM5
> PADDW XMM0, XMM4
> PACKUSWB XMM0, XMM0 // convert words to bytes with saturation
> MOVD DWORD PTR [RDX], XMM0 // store the result
> @3:
> ADD RCX, 4
> ADD RDX, 4
> DEC R8D
> JNZ @1
>
> {$endif}
>
> end;
>
>
>
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
More information about the fpc-devel
mailing list