[fpc-pascal] Floating point question

Bernd Oppolzer bernd.oppolzer at t-online.de
Tue Feb 13 23:36:21 CET 2024


My opinions about the solutions below ...


Am 13.02.2024 um 12:07 schrieb Thomas Kurz via fpc-pascal:
>> But, sorry, because we are talking about compile time math, performance (nanoseconds) in this case doesn't count, IMO.
>>
>>
>>
>> That's what i thought at first, too. But then I started thinking about how to deal with it and sumbled upon difficulties very soon:
>>
>> a) 8427.0 + 33.0 / 1440.0
>> An easy case: all constants, so do the calculation at highest precision and reduce it afterwards, if possible.
I agree; I would say:
all constants, so do the calculation at highest precision and reduce it 
afterwards, if required by the target
>>
>> b) var_single + 33.0 / 1440.0
>> Should also be feasable by evaluating the constant expression first, then reducing it to single (if possible) and adding the variable in the end.
yes ... first evaluate the constant expression with maximum precision 
(best at compile time), then reduce the result.
The reduction to single must be done in any case, because the var_single 
in the expression dictates it, IMO
>>
>> c) 8427.0 + var_double / 1440.0
>> Because of using the double-type variable here, constants should be treated as double even at the cost of performance due to not knowing whether the result will be assigned to a single or double.
yes
>>
>> d) 8427.0 + var_single / 1440.0
>> And this is the one I got to struggle with. And I can imagine this is the reason for the decision about how to handle decimal constants.
>> My first approach would have been to implicitly use single precision values throughout the expression. This would mean to lose precision if the result will be assigned to a double-precision variable. One could say: "bad luck - if the programmer intended to get better precision, he should have used a double-precision variable as in case c". But this wouldn't be any better than the current state we have now.
8427.0 + (var_single / 1440.0)

the 1440.0 can be reduced to single, because the other operand is single
and so the whole operation is done using single arithmetic.

If here we had a FP constant instead of var_single, the whole operation 
IMO should be done
with maximum precision and at compile time in the best case. I have no 
problem that this
operation may give a different result with decimal constants than with 
explicitly typed
(reduced) FP variables. This can be easily explained to the users. 
Operations involving
FP variables with reduced precision may give reduced precision results. 
This seems to
be desirable for performance reasons and can be avoided by appropriate 
type casting.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20240213/63ae3bdf/attachment.htm>


More information about the fpc-pascal mailing list