[fpc-pascal] Floating point question
James Richters
james.richters at productionautomation.net
Sat Feb 3 18:42:46 CET 2024
I don't understand it either, the result of the 33/1440 is being stored in a single precision apparently, but it shouldn't be,.
If TT is Double or Extended, then all parts of the evaluation of TT should be carried out in the same way, whether evaluated
By the compiler or the program. That is what I expect, but that is not what is happening.
program TESTDBL1 ;
Const
TT_Const = 8427 + 33 / 1440.0 ;
SS_Const = 8427 + Double(33 / 1440.0) ;
Var
AA_Double : Double;
BB_Double : Double;
CC_Double : Double;
DD_Double : Double;
EE_Double : Double;
FF_Double : Double;
GG_Double : Double;
HH_Double : Double;
II_Double : Double;
JJ_Double : Double;
KK_Double : Double;
SS_Double : Double;
TT_Double : Double;
VV_Single : Single;
WW_Single : Single;
XX_Single : Single;
YY_Single : Single;
ZZ_Single : Single;
begin
AA_Double := 8427;
BB_Double := 33/1440;
CC_Double := AA_Double+BB_Double;
DD_Double := 8427 + 33 / 1440.0 ;
VV_Single := 8427;
WW_Single := 33/1440;
XX_Single := VV_Single+WW_Single;
YY_Single := 8427 + 33 / 1440.0 ;
ZZ_Single := DD_Double;
EE_Double := Double(8427 + 33 / 1440.0) ;
FF_Double := 8427 + Double(33 / 1440.0) ;
GG_Double := Double(8427) + Double(33) / Double(1440.0) ;
HH_Double := Double(8427 + Single(33 / 1440.0)) ;
II_Double := 33;
JJ_Double := 1440;
KK_Double := AA_Double+II_Double/JJ_Double;
SS_Double := SS_Const;
TT_Double := TT_Const;
WRITELN ( 'AA_Double := 8427; =' , AA_Double : 20 : 20 ) ;
WRITELN ( 'BB_Double := 33/1440; =' , BB_Double : 20 : 20 ) ;
WRITELN ( 'CC_Double := AA_Double+BB_Double; =' , CC_Double : 20 : 20 ) ;
WRITELN ( 'DD_Double := 8427 + 33 / 1440.0 ; =' , DD_Double : 20 : 20 ) ;
WRITELN ( 'VV_Single := 8427; =' , VV_Single : 20 : 20 ) ;
WRITELN ( 'WW_Single := 33/1440; =' , WW_Single : 20 : 20 ) ;
WRITELN ( 'XX_Single := VV_Single+WW_Single; =' , XX_Single : 20 : 20 ) ;
WRITELN ( 'YY_Single := 8427 + 33 / 1440.0 ; =' , YY_Single : 20 : 20 ) ;
WRITELN ( 'ZZ_Single := DD_Double; =' , ZZ_Single : 20 : 20 ) ;
WRITELN ( 'EE_Double := Double(8427 + 33 / 1440.0) ; =' , EE_Double : 20 : 20 ) ;
WRITELN ( 'FF_Double := 8427 + Double(33 / 1440.0) ; =' , FF_Double : 20 : 20 ) ;
WRITELN ( 'GG_Double := Double(8427) + Double(33) / Double(1440.0) ; =' , GG_Double : 20 : 20 ) ;
WRITELN ( 'HH_Double := Double(8427 + Single(33 / 1440.0)) ; =' , HH_Double : 20 : 20 ) ;
WRITELN ( 'KK_Double := AA_Double+II_Double/JJ_Double; =' , KK_Double : 20 : 20 ) ;
WRITELN ( 'TT_Const = 8427 + 33 / 1440.0 ; =' , TT_Const : 20 : 20 ) ;
WRITELN ( 'SS_Const = Double(8427 + 33 / 1440.0); =' , SS_Const : 20 : 20 ) ;
WRITELN ( 'TT_Double := TT_Const; =' , TT_Double : 20 : 20 ) ;
WRITELN ( 'SS_Double := SS_Const; =' , SS_Double : 20 : 20 ) ;
end.
AA_Double := 8427; =8427.00000000000000000000
BB_Double := 33/1440; =0.02291666666666666500
CC_Double := AA_Double+BB_Double; =8427.02291666666680000000
DD_Double := 8427 + 33 / 1440.0 ; =8427.02246093750000000000
VV_Single := 8427; =8427.00000000000000000000
WW_Single := 33/1440; =0.02291666716000000000
XX_Single := VV_Single+WW_Single; =8427.02246100000000000000
YY_Single := 8427 + 33 / 1440.0 ; =8427.02246100000000000000
ZZ_Single := DD_Double; =8427.02246100000000000000
EE_Double := Double(8427 + 33 / 1440.0) ; =8427.02246093750000000000
FF_Double := 8427 + Double(33 / 1440.0) ; =8427.02291666716340000000
GG_Double := Double(8427) + Double(33) / Double(1440.0) ; =8427.02291666666680000000
HH_Double := Double(8427 + Single(33 / 1440.0)) ; =8427.02246093750000000000
KK_Double := AA_Double+II_Double/JJ_Double; =8427.02291666666680000000
TT_Const = 8427 + 33 / 1440.0 ; =8427.02246100000000000000
SS_Const = Double(8427 + 33 / 1440.0); =8427.02291666716340000000
TT_Double := TT_Const; =8427.02246093750000000000
SS_Double := SS_Const; =8427.02291666716340000000
I would actually expect values that were calculated by the compiler to ALWAYS be done in extended and only the final answer be reduced to fit into a smaller variable.
If this was the case, then the result of ALL would be 8427.0229…
This may be debatable, but certainly when the result is to be stored in a double then all operations calculated by the compiler should also be stored in doubles, I don't see how anything else could be argued to be correct.
This is not the case at all, or DD, EE, FF, and GG would all be 8427.0229… but only FF is because I explicitly stated the result of the division is to be a double.
When the program executes and does math, in the example of BB and CC, and II, it’s always correct, but when the compiler evaluates it, it’s doing it wrong. And storing portions of the calculation in a single even if the final result is a double.
The compiler should ALWAYS use the highest precision possible, because it can be stored in reduce precision variables, but once it’s been butchered by low precision, it can’t be fixed.
Constants are also evaluated wrong, you don’t know what that constant is going to be used for, so all steps of evaluating a constant MUST be done in extended by the compiler, or the answer is just wrong.
TT_Const and SS_Const should have been the same, so that when assigned to double variables TT_Double and SS_Double they would also be the same. TT_Double and TT_Const are wrong.
I think this is a legitimate bug you have discovered. I shouldn’t have to cast the division, it’s not what any user would expect to need to do.
My tests were done on a Windows 10 64 bit machine with FPC Win32.
■ Free Pascal IDE Version 1.0.12 [2023/06/26]
■ Compiler Version 3.3.1-12875-gadf843196a
James
-----Original Message-----
From: fpc-pascal <fpc-pascal-bounces at lists.freepascal.org> On Behalf Of Thomas Kurz via fpc-pascal
Sent: Friday, February 2, 2024 4:37 PM
To: FPC-Pascal users discussions <fpc-pascal at lists.freepascal.org>
Cc: Thomas Kurz <fpc.2021 at t-net.ruhr>
Subject: Re: [fpc-pascal] Floating point question
Well, 8427.0229...., that's what I want.
But what I get is 8427.0224....
And that's what I don't unterstand.
----- Original Message -----
From: Bernd Oppolzer via fpc-pascal < <mailto:fpc-pascal at lists.freepascal.org> fpc-pascal at lists.freepascal.org>
To: Bart via fpc-pascal < <mailto:fpc-pascal at lists.freepascal.org> fpc-pascal at lists.freepascal.org>
Sent: Sunday, January 28, 2024, 10:13:07
Subject: [fpc-pascal] Floating point question
To simplify the problem further:
the addition of 12 /24.0 and the subtraction of 0.5 should be removed, IMO, because both can be done with floats without loss of precision (0.5 can be represented exactly in float).
So the problem can be reproduced IMO with this small Pascal program:
program TESTDBL1 ;
var TT : REAL ;
begin (* HAUPTPROGRAMM *)
TT := 8427 + 33 / 1440.0 ;
WRITELN ( 'tt=' , TT : 20 : 20 ) ;
end (* HAUPTPROGRAMM *) .
With my compiler, REAL is always DOUBLE, and the computation is carried out by a P-Code interpreter (or call it just-in-time compiler - much like Java), which is written in C.
The result is:
tt=8427.02291666666678790000
and it is the same, no matter if I use this simplified computation or the original
tt := (8427 - 0.5) + (12 / 24.0) + (33 / 1440.0);
My value is between the two other values:
tt=8427.02291666666680000000
tt=8427.02291666666678790000
ee=8427.02291666666666625000
The problem now is:
the printout of my value suggest an accuracy which in fact is not there, because with double, you can trust only the first 16 decimal digits ... after that, all is speculative a.k.a. wrong. That's why FPC IMO rounds at this place, prints the 8, and then only zeroes.
The extended format internally has more hex digits and therefore can reliably show more decimal digits.
But the last two are wrong, too (the exact value is 66666... period).
HTH,
kind regards
Bernd
Am 27.01.2024 um 22:53 schrieb Bart via fpc-pascal:
> On Sat, Jan 27, 2024 at 6:23 PM Thomas Kurz via fpc-pascal
> < <mailto:fpc-pascal at lists.freepascal.org> fpc-pascal at lists.freepascal.org> wrote:
>> Hmmm... I don't think I can understand that. If the precision of "double" were that bad, it wouldn't be possible to store dates up to a precision of milliseconds in a TDateTime. I have a discrepancy of 40 seconds here.
> Consider the following simplified program:
> ====
> var
> tt: double;
> ee: extended;
> begin
> tt := (8427 - Double(0.5)) + (12/ Double(24.0)) +
> (33/Double(1440.0)) + (0/Double(86400.0));
> ee := (8427 - Extended(0.5)) + (12/ Extended(24.0)) +
> (33/Extended(1440.0)) + (0/Extended(86400.0));
> writeln('tt=',tt:20:20);
> writeln('ee=',ee:20:20);
> end.
> ===
> Now see what it outputs:
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>fpc test.pas Free
> Pascal Compiler version 3.2.2 [2021/05/15] for i386 ...
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>test
> tt=8427.02291666666680000000
> ee=8427.02291666666666625000
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>fpc -Px86_64 test.pas
> Free Pascal Compiler version 3.2.2 [2021/05/15] for x86_64 ..
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>test
> tt=8427.02291666666680000000
> ee=8427.02291666666680000000
> On Win64 both values are the same, because there Extended = Double.
> On Win32 the Extended version is a bit closer to the exact solution:
> 8427 - 1/2 + 1/2 + 33/1440 = 8427 + 11/480
> Simple as that.
> Bart
> _______________________________________________
> fpc-pascal maillist <mailto:-fpc-pascal at lists.freepascal.org> -fpc-pascal at lists.freepascal.org
> <https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20240203/8d778f49/attachment-0001.htm>
More information about the fpc-pascal
mailing list