[fpc-pascal] Floating point question
James Richters
james.richters at productionautomation.net
Tue Feb 20 19:36:42 CET 2024
>>If you're using Win64, then the answer is simple: x86_64-win64 unlike any
other x86 target does not support Extended, so neither the compiler nor the
code in runtime will ever calculate anything with that precision.
To clarify, I am using i386-win32 on a 64 bit specifically because Extended
is really just a Double on x86_64-win64. All my test programs were done
with Win32.
>>you see the pattern? You simply have to rotate the six digits in a certain
manner ...
I see it now that you pointed it out and I think that it is really cool that
it's the same digits rotated! Thanks!
>>I don't think you need the cast to extended around the divisions;
Correct, I don't need to do this and I should not need to do it, but I also
should not need to re-cast the terms of the division, but with constants
that's what I must do to get the correct result.
Also, my programs would never re-cast the constants, I was just making it
clear that a byte divided by a single in fact does produce a correct
extended answer when done with variables, and I didn't want any doubt that
it was dividing a byte by a single.
This is the correct behavior. My point was that the problem isn't that the
3.5 was stored or cast as a single, It's valid for it to be a single, and
that should make no difference at all, and in fact my results as exactly the
same without all the casting,
But the only way to get the correct answer with constants is to do what
should be an unnecessary cast to extended of terms in my expression:
program Const_Vs_Var;
Const
A_const = 1;
B_const = 3.5;
Var
A_Var : Byte;
B_Var : Single;
Const_Ans1, Var_Ans1, Difference1 : Extended;
Const_Ans2, Var_Ans2, Difference2 : Extended;
Begin
A_Var := A_Const;
B_Var := B_Const;
Const_Ans1 := A_Const/B_Const;
Var_Ans1 := A_Var/B_Var;
Difference1 := Var_Ans1-Const_Ans1;
Const_Ans2 := A_Const/Extended(B_Const); //I should not need to cast
B_Const in this way
Difference2 := Var_Ans1-Const_Ans2;
WRITELN ( ' Const_Ans1 = ', Const_Ans1);
WRITELN ( ' Var_Ans1 = ', Var_Ans1);
WRITELN ( ' Difference1 = ', Difference1);
Writeln;
WRITELN ( ' Const_Ans2 = ', Const_Ans2);
WRITELN ( ' Var_Ans1 = ', Var_Ans1);
WRITELN ( ' Difference2 = ', Difference2);
End.
Const_Ans1 = 2.85714298486709594727E-0001 //This is a single precision
calculation stored in an extended
Var_Ans1 = 2.85714285714285714282E-0001 //The nice repeating decimal
I expect
Difference1 = -1.27724238804447203649E-0008 //This should have been 0
Const_Ans2 = 2.85714285714285714282E-0001 //I should not have had to
cast Extended(B_Var) to get this
Var_Ans1 = 2.85714285714285714282E-0001 // The correct answer again
just for clarification
Difference2 = 0.00000000000000000000E+0000 //Now it is 0 as I expected
>>When casting this way
>>Byte(A_Var)/Single(B_Var)
>>I would expect the division to be done with single precision, but
apparently it is done
>>using extended (or another) precision ... on Windows, not on Linux. And
this is what
>>causes your headaches.
I would NOT expect this to result in single precision, when I divide a Byte
by a single in Turbo Pascal and assign it to a Double, the result is correct
in double precision.
The ONLY way to get a single for an answer in Turbo Pascal is to define a
variable as a single and use that to do that calculation.
MySingle := A_Var/B_Var;
If the variable is a double:
MyDouble := A_Var/B_Var;
Then no matter what B_Var is, whether it's a single or a double, MyDouble is
the same correct number
If I want the result to be a single then:
Single(A_Var/B_Var)
Should be what I require.
It doesn't matter in Turbo Pascal if I am dividing by a variable defined as
a single or a variable defined as a double, or an undefined constant,
and it does not matter in FPC either as long as my division is done with
variables.
It's FPC with Constants that is making MyByte/MySingle come out as a single,
and that is incorrect. Division by a single does not force the answer to
be a single.
Please See this:
program Const_Vs_Var;
Const
A_const = 1;
B_Const = 3.5;
C_Const = 7;
Var
A_Var : Byte;
B_Var : Single;
C_Var : Byte;
VPi : Extended;
Const_Ans1, Var_Ans1, Difference1 : Extended;
Const_Ans2, Var_Ans2, Difference2 : Extended;
Const_Ans3, Var_Ans3, Difference3 : Extended;
Begin
A_Var := A_Const;
B_Var := B_Const;
C_Var := C_Const;
VPi := Pi;
Const_Ans1 := A_Const/B_Const;
Var_Ans1 := A_Var/B_Var;
Difference1 := Var_Ans1-Const_Ans1;
Const_Ans2 := A_Const/C_Const;
Var_Ans2 := A_Var/C_Var;
Difference2 := Var_Ans2-Const_Ans2;
Const_Ans3 := Pi/B_Const;
Var_Ans3 := VPi/B_Var;
Difference3 := Var_Ans3-Const_Ans3;
WRITELN ( ' Const_Ans1 = ', Const_Ans1);
WRITELN ( ' Var_Ans1 = ', Var_Ans1);
WRITELN ( ' Difference1 = ', Difference1);
Writeln;
WRITELN ( ' Const_Ans2 = ', Const_Ans2);
WRITELN ( ' Var_Ans2 = ', Var_Ans2);
WRITELN ( ' Difference2 = ', Difference2);
Writeln;
WRITELN ( ' Const_Ans3 = ', Const_Ans3);
WRITELN ( ' Var_Ans3 = ', Var_Ans3);
WRITELN ( ' Difference3 = ', Difference3);
End.
Const_Ans1 = 2.85714298486709594727E-0001
Var_Ans1 = 2.85714285714285714282E-0001
Difference1 = -1.27724238804447203649E-0008
Const_Ans2 = 1.42857142857142857141E-0001
Var_Ans2 = 1.42857142857142857141E-0001
Difference2 = 0.00000000000000000000E+0000
Const_Ans3 = 8.97597901025655211019E-0001
Var_Ans3 = 8.97597901025655211019E-0001
Difference3 = 0.00000000000000000000E+0000
If division by a single forced the answer to be a single, then Const_Ans3
:= Pi/B_Const; produce a single
But it does not, it correctly produces extended results with both Variables
and Constants.
Const_Ans1 := A_Const/B_Const; is incorrectly being evaluated as a single
and I don't know why.
The fact that there is something in Difference1 but not Difference2 or
Difference3 is why I think there is a bug somewhere.
I am expecting my byte divided by a single to be an extended for both
variables and constants, but only variables is working.
Surely if a Byte divided by a byte can result in an extended than a Byte
divided by a single should also result in an extended.
It's like this for everything, you can add a byte to a byte and get a word
for an answer,
the result of any math can always result in higher precision than all of the
terms involved,
otherwise there would be no point to double precision or extended precision.
I have never needed to cast my terms to get expected results.
Are we really supposed to do:
MyConstant = Extended(2.5)/Extended(3.5); to get it to be an extended?
MyConstant = 2.5/3.5 should result in an extended, I shouldn't need the
casting.
James
More information about the fpc-pascal
mailing list