[fpc-pascal] Floating point question

James Richters james.richters at productionautomation.net
Tue Feb 13 12:34:08 CET 2024


Ok, maybe this example will prove why it's not happening correctly:

program Const_Vs_Var;

Const
   A_const = Integer(8427);
   B_const = Byte(33);
   C_const = Single(1440.5);
   Win_Calc = 16854.045817424505380076362374176;
   Const_Ans = 16854.045817424505380076362374176 / (8427 + 33 / 1440.5);
Var
   A_Var : Integer;
   B_Var : Byte;
   C_Var : Single;
   Const_Ans1, Var_Ans1 : Extended;

Begin
   A_Var := A_Const;
   B_Var := B_Const;
   C_Var := C_Const;

   Var_Ans1   := Win_Calc / (A_Var+B_Var/C_Var);
   Const_Ans1 := Win_Calc / (A_Const+B_Const/C_Const);

   WRITELN ( '  Const_Ans = ',  Const_Ans:20:20);
   WRITELN ( ' Const_Ans1 = ', Const_Ans1:20:20);
   WRITELN ( '   Var_Ans1 = ',   Var_Ans1:20:20);
End.

The result is:
  Const_Ans = 2.00000010627116630224
 Const_Ans1 = 2.00000010627116630224
   Var_Ans1 = 2.00000000000000000000



Now you can see, if the math was done the same as the way math is done for
variables, we could have stored the constants as Byte(2).   But because the
math is being carried out after the reduction in precision we are left with
storing this as extended. 

If the result of all the math can be reduced, or if there is no math, then
it's great to reduce precision, but if the reduction in precision happens
before the math, you can end up with the opposite of what you intended.
Sure the compiler is working with faster math, but who cares what the
compiler has to do, now we're going to be stuck with a program using
extended(2.00000010627116630224) for any calculations that use Const_Ans
instead of byte(2);  if Const_Ans is used in some kind of iterative process,
it the program could be using this extended millions of times when it could
have been using a byte.

Notice when I do the EXACT same math with variables, it DOES give me a
result of 2, and THAT can be reduced.

If the answer after all the math can be reduced, it should be reduced, if it
can't be, then it should not be.

Math with constants should be the same as math with variables.

I'm trying to show there doesn't need to be a trade off at all, the math
with constants just needs to be done correctly... as in the exact same way
math with variables is done.

What has happened is the math with constants was written and tested with the
assumption that all constants would be full precision, because it was
impossible for constants to be anything other than full precision, but now
that is no longer the case and the math with constants isn't working
correctly anymore.  Either the math needs to happen before the reduction in
precision or the math needs to be fixed so it works the same as math with
variables, either way there won't need to be a trade off and everything will
work the way everyone wants it to.. performance when possible and precision
when needed.

James





More information about the fpc-pascal mailing list