[fpc-devel] Improving i8086 performance..

Max Nazhalov stein_nospam at mail.ru
Sat Dec 28 15:52:43 CET 2013


On Sat, 28 Dec 2013 01:15:41 +0200, Nikolay Nikolov wrote:
> It looks correct, but I still haven't reviewed the overflow checking part of the 64-bit multiplication routine. I'll commit the patch when I finish that.

Thanks for the effort, Nikolay!
To help with the understanding of data/decision flow with overflow checking, below is the primary scheme used to construct the assembler code.

WBR, Max.

###################################
64-bit multiplication via 16-bit digits: (A3:A2:A1:A0)*(B3:B2:B1:B0)

//////// STEP 1; break-down to 32-bit multiplications, each of them generates 64-bit result:
  (A3:A2*B3:B2)<<64 + (A3:A2*B1:B0)<<32 + (A1:A0*B3:B2)<<32 + (A1:A0*B1:B0)

(A1:A0*B1:B0) = (A1*B1)<<32 + (A1*B0)<<16 + (A0*B1)<<16 + (A0:B0)
 -- never overflows, forms the base of the final result, name it as "R64"

(A3:A2*B3:B2) is not required for the 64-bit result if overflow is not checked, since it is completely beyond the resulting width.
 -- always overflows if "<>0", so can be checked as "((A2|A3)<>0)&&(B2|B3)<>0)"

(A3:A2*B1:B0) and (A1:A0*B3:B2) are partially required for the final result
 -- to be calculated on steps 2 and 3 as a correction for the "R64"

//////// STEP 2; calculate "R64+=(A3:A2*B1:B0)<<32" (16-bit multiplications, each of them generates 32-bit result):
  (A3*B1)<<32 + (A3*B0)<<16 + (A2*B1)<<16 + (A2*B0)

((A3*B1)<<32)<<32 is not required for the 64-bit result if overflow is not checked, since it is completely beyond the resulting width.
 -- always overflows if "<>0", so can be checked as "(A3<>0)&&(B1<>0)"

((A3*B0)<<16)<<32: only low word of "A3*B0" contributes to the final result if overflow is not checked.
 -- overflows if the hi_word "<>0"
 -- overflows if R64+(lo_word<<48) produces C-flag

((A2*B1)<<16)<<32: only low word of "A2*B1" contributes to the final result if overflow is not checked.
 -- overflows if the hi_word "<>0"
 -- overflows if R64+(lo_word<<48) produces C-flag

(A2*B0)<<32: the whole dword is significand, name it as "X"
 -- overflows if R64+(X<<32) produces C-flag

//////// STEP 3; calculate "R64+=(A1:A0*B3:B2)<<32" (16-bit multiplications, each of them generates 32-bit result):
  (A1*B3)<<32 + (A1*B2)<<16 + (A0*B3)<<16 + (A0*B2)

((A1*B3)<<32)<<32 is not required for the 64-bit result if overflow is not checked, since it is completely beyond the resulting width.
 -- always overflows if "<>0", so can be checked as "(A1<>0)&&(B3<>0)"

((A1*B2)<<16)<<32: only low word of "A1*B2" contributes to the final result if overflow is not checked.
 -- overflows if the hi_word "<>0"
 -- overflows if R64+(lo_word<<48) produces C-flag

((A0*B3)<<16)<<32: only low word "A0*B3" contributes to the final result if overflow is not checked.
 -- overflows if the hi_word "<>0"
 -- overflows if R64+(lo_word<<48) produces C-flag

(A0*B2)<<32: the whole dword is significand, name it as "Y"
 -- overflows if R64+(Y<<32) produces C-flag

//////// 16-bit multiplications summary:
  A1*B1
  A1*B0
  A0*B1
  A0:B0
  A3*B0 [only lo_word is needed; overflow if hi_word<>0]
  A2*B1 [only lo_word is needed; overflow if hi_word<>0]
  A2*B0
  A1*B2 [only lo_word is needed; overflow if hi_word<>0]
  A0*B3 [only lo_word is needed; overflow if hi_word<>0]
  A0*B2





More information about the fpc-devel mailing list