[fpc-devel] Dangerous optimization in CASE..OF
Jonas Maebe
jonas at freepascal.org
Sat Jul 15 18:38:08 CEST 2017
On 15/07/17 17:17, Martok wrote:
> For example, if I index an array, I know bad things may happen if I don't check
> the index beforehand, so I must always do that.
No, you don't always have to do that. That is the whole point of a type
system.
> That if the compiler makes up the array access somewhere along the way sometimes
> no check happens is not very predictable.
Array indexation is just a side-effect. The basic thing is this:
{$r+}
type
tenum = (ea,eb,ec,ed,ef,eg);
tsubenum = eb..ef;
tsubenum2 = ec..ef;
var
a: tsubenum;
b: tsubenum2;
begin
b:=tsubenum2(eg);
a:=b;
end.
This will never generate a range check error, because the type
information states that a tsubenum2 value is always a valid tsubenum
value. Array indexing a special case of this, as semantically the
expression you use to index the array is first assigned to the range
type of the array.
I would assume that this is something that "someone with a solid
knowledge of the language" would expect.
>> and in comparisons that get optimised away at compile time because they will
>> always have the same result at run time according to the type information.
> I've shown that is not the case for the more obvious expressions in the forum
> post linked above.
> Several different ways of writing the (apparent) tautology "is EnumVar in
> Low(EnumType)..High(EnumType)" all handle out-of-range-values (expressly, not as
> a side effect of something else).
The in-expression may indeed handle this, but plain comparisons are
removed at compile-time:
type
tsubrange = 6..8;
var
a: tsubrange;
begin
a:=tsubrange(10);
if a>8 then
writeln('this statement is removed at compile-time, because a > 8
is impossible according to the type information');
end.
It seems we don't do this transformation for enums right now (and only
for integer subtypes), but that's a limitation of the implementation
rather than something that is done by design. And the principle is the same.
> Which is especially noteworthy because with
> strict enums, we might as well drop the elseblock entirely and warn "unreachable
> code" in these tests.
Indeed, just like the removal of the comparison above generates a warning.
> However, FPC does not have the luxury of being the first to define and implement
> a new language (well, except for $mode FPC and ObjFPC). There is precedent.
At least the precedent in ISO Pascal
(http://www.standardpascal.org/iso7185rules.html) is that you cannot
convert anything else to an enum, and hence an enum by design always
contains a value that is valid for that type (unless you did not
initialise it all, in which case the result is obviously undefined as well).
And for subranges, it says "It is an error to assign a value outside of
the corresponding range to a variable of that type". Using subrange
values to calculate something else does promote it to the integer type,
but we do that too.
The Extended Pascal standard
(http://www.eah-jena.de/~kleine/history/languages/iso-iec-10206-1990-ExtendedPascal.pdf)
says that enumeration and subrange types are "non-bindable". This means
that they cannot be used with input/output (including files; this avoids
the issue you mentioned with reading invalid values from disk). It does
not really say much else about enumerated types specifically, but they
are of course also ordinal types and for those it says in the section
about Assignment-compatibility (6.4.6):
***
A value of type T2 shall be designated assignment-compatible with a type
T1 if any of the following six statements is true:
...
d) T1 and T2 are compatible ordinal-types, and the value of type T2 is
in the closed interval specified by the type T1.
...
At any place where the rule of assignment-compatibility is used
a) it shall be an error if T1 and T2 are compatible ordinal-types and
the value of type T2 is not in the closed interval specified by the type
T1;
***
That seems pretty clear in terms of stating that having value that is
outside the range of a type is an error. And error is defined as:
***
A violation by a program of the requirements of this International
Standard that a processor is permitted to leave undetected.
***
I.e., undefined behaviour.
It does say that the "range-type" of a subrange-type is the "host-type",
but this range-type is only referenced in very specific contexts, like
when defining assignment compatibility (in a non-quoted part of section
6.4.6 above), and when defining how for-loops must behave (which is a
place were FPC is in fact in error:
https://bugs.freepascal.org/view.php?id=24318 )
> And
> that precedent is Conclusion 1 of my post above: Enums are handled as a
> redefinition of the base type with constants for the names. Some intrinsics
> (pred/succ) and the use of the type itself (array[TEnumType], set of) use the
> enum-ness for something, most don't. There is nothing undefined.
> Do not confuse the additional treatment added by {$R+} with the basic defined
> behaviour.
{$r+} can help with detecting when undefined behaviour would otherwise
occur, like when assigning a value that is out-of-bounds to a subrange
type or an enum. Explicit typecasting disables this aid. It does not
remove the undefined behaviour.
Jonas
More information about the fpc-devel
mailing list