[fpc-devel] Dangerous optimization in CASE..OF

Jonas Maebe jonas at freepascal.org
Sat Jul 15 18:38:08 CEST 2017


On 15/07/17 17:17, Martok wrote:
> For example, if I index an array, I know bad things may happen if I don't check
> the index beforehand, so I must always do that.

No, you don't always have to do that. That is the whole point of a type 
system.

> That if the compiler makes up the array access somewhere along the way sometimes
> no check happens is not very predictable.

Array indexation is just a side-effect. The basic thing is this:

{$r+}
type
   tenum = (ea,eb,ec,ed,ef,eg);
   tsubenum = eb..ef;
   tsubenum2 = ec..ef;
var
   a: tsubenum;
   b: tsubenum2;
begin
   b:=tsubenum2(eg);
   a:=b;
end.

This will never generate a range check error, because the type 
information states that a tsubenum2 value is always a valid tsubenum 
value. Array indexing a special case of this, as semantically the 
expression you use to index the array is first assigned to the range 
type of the array.

I would assume that this is something that "someone with a solid 
knowledge of the language" would expect.

>> and in comparisons that get optimised away at compile time because they will
>> always have the same result at run time according to the type information.
> I've shown that is not the case for the more obvious expressions in the forum
> post linked above.
> Several different ways of writing the (apparent) tautology "is EnumVar in
> Low(EnumType)..High(EnumType)" all handle out-of-range-values (expressly, not as
> a side effect of something else).

The in-expression may indeed handle this, but plain comparisons are 
removed at compile-time:

type
   tsubrange = 6..8;
var
   a: tsubrange;
begin
   a:=tsubrange(10);
   if a>8 then
     writeln('this statement is removed at compile-time, because a > 8 
is impossible according to the type information');
end.

It seems we don't do this transformation for enums right now (and only 
for integer subtypes), but that's a limitation of the implementation 
rather than something that is done by design. And the principle is the same.

> Which is especially noteworthy because with
> strict enums, we might as well drop the elseblock entirely and warn "unreachable
> code" in these tests.

Indeed, just like the removal of the comparison above generates a warning.

> However, FPC does not have the luxury of being the first to define and implement
> a new language (well, except for $mode FPC and ObjFPC). There is precedent.

At least the precedent in ISO Pascal 
(http://www.standardpascal.org/iso7185rules.html) is that you cannot 
convert anything else to an enum, and hence an enum by design always 
contains a value that is valid for that type (unless you did not 
initialise it all, in which case the result is obviously undefined as well).

And for subranges, it says "It is an error to assign a value outside of 
the corresponding range to a variable of that type". Using subrange 
values to calculate something else does promote it to the integer type, 
but we do that too.

The Extended Pascal standard 
(http://www.eah-jena.de/~kleine/history/languages/iso-iec-10206-1990-ExtendedPascal.pdf) 
says that enumeration and subrange types are "non-bindable". This means 
that they cannot be used with input/output (including files; this avoids 
the issue you mentioned with reading invalid values from disk). It does 
not really say much else about enumerated types specifically, but they 
are of course also ordinal types and for those it says in the section 
about Assignment-compatibility (6.4.6):

***
A value of type T2 shall be designated assignment-compatible with a type 
T1 if any of the following six statements is true:
...
d) T1 and T2 are compatible ordinal-types, and the value of type T2 is 
in the closed interval specified by the type T1.
...
At any place where the rule of assignment-compatibility is used
a) it shall be an error if T1 and T2 are compatible ordinal-types and 
the value of type T2 is not in the closed interval specified by the type
T1;
***

That seems pretty clear in terms of stating that having value that is 
outside the range of a type is an error. And error is defined as:

***
A violation by a program of the requirements of this International 
Standard that a processor is permitted to leave undetected.
***

I.e., undefined behaviour.

It does say that the "range-type" of a subrange-type is the "host-type", 
but this range-type is only referenced in very specific contexts, like 
when defining assignment compatibility (in a non-quoted part of section 
6.4.6 above), and when defining how for-loops must behave (which is a 
place were FPC is in fact in error: 
https://bugs.freepascal.org/view.php?id=24318 )

> And
> that precedent is Conclusion 1 of my post above: Enums are handled as a
> redefinition of the base type with constants for the names. Some intrinsics
> (pred/succ) and the use of the type itself (array[TEnumType], set of) use the
> enum-ness for something, most don't. There is nothing undefined.
> Do not confuse the additional treatment added by {$R+} with the basic defined
> behaviour.

{$r+} can help with detecting when undefined behaviour would otherwise 
occur, like when assigning a value that is out-of-bounds to a subrange 
type or an enum. Explicit typecasting disables this aid. It does not 
remove the undefined behaviour.


Jonas



More information about the fpc-devel mailing list