[fpc-devel] [Suggestion] Enumeration range-check intrinsic

Thu Jul 4 21:20:32 CEST 2019

On 03/07/2019 09:26, Ondrej Pokorny wrote:
> On 02.07.2019 23:34, Jonas Maebe wrote:
>> Invalid data means undefined behaviour, always. "is" is not a special
>> case that is immune to this.
> 
> Don't you really see the need to handle invalid data with a /defined/
> behavior?

My point is that is impossible to do so, so trying to do it in a way
that works in some/most cases, is much more dangerous than categorically
refusing to try to do it, as it creates a false sense of security.

For example:

{$mode objfpc}
type
  tc = class
    a: array[1..2000] of longint;
  end;

  tcc = class of tc;

  trec = record
    p: pointer;
  end;
  prec = ^trec;

var
  c: tc;
  p: prec;
  cc: tcc;
begin
  new(p);
  cc:=tc;
  p^.p:=pointer(cc);
  c:=tc(p);
  writeln(c is tc);
end.

This will print true, even though it's obviously bogus.

> Enum and subrange types /can/ store invalid data and they /do/
> store invalid data.

The same goes for classes, and by extension for any pointer type.

> Be it implicit assignments in object creation,
> clearing records with Default(TMyRecord), reading records from streams
> etc etc.
> The issue is here and it needs some solution. If you don't like
> "is" for this purpose, why not to introduce a compiler intrinsic
> Valid(MyValue) that would check if MyValue is within the allowed range
> of its type?

Because it means that not a single transformation or check in the
compiler can assume that MyValue contains a valid value. Either the
compiler can assume validity and the manual check does not make any
sense, or the compiler cannot assume it and then the language's type
system becomes meaningless (*), and bitpacked arrays/records should be
removed from the language.

The whole point of a type system is that
a) the programmer is responsible to ensure variables are filled with
valid values when bypassing the type system (explicit typecasts, inline
assembly, variant records, absolute variables, "external" aliases, ...)
b) the compiler is responsible in case the type system is not bypassed
(**): either it gives a compile-time error, or it generates run-time
checks that detect issues (which can be disabled for performance reasons)

> Or do you really think Valid(MyValue) should always return true and thus
> be replaced with the true constant at compile time?

Yes.

That said, these assumptions can be used in multiple ways. E.g. when
clang appeared, many programmers hated it because it generated
non-working code for many programs that worked with GCC. The reason was
that clang aggressively exploits undefined behaviour for optimization
purposes, and even explicitly models it internally. The original code
may even have worked on one GCC version/platform, but have been broken
on another combination.

Then clang introduced another project: ubsan (undefined behaviour
sanitizer). Everywhere the compiler makes an assumption about something,
this one will insert checks so that at run time it verifies that the
assumptions the compiler made are not invalidated by undefined
behaviour, and if not it aborts the program. A ubsan-like mode for FPC
would definitely be useful, even if not all undefined behaviour in FPC
is also undefined in Delphi (or maybe especially because of that
reason). In a way, the -CR option is already a bit like that (and it is
indeed very useful).

Jonas

(*) The situation with the enumerations and subrange types is more
complicated than that. The reason is (as Martok pointed out in a
previous thread) that Delphi does explicitly consider the entire range
of representable bitpatterns valid for a subrange/enumeration type. This
leads to some inconsistencies and weird range checking behaviour, but
with that definition an "IsValid()" intrinsic for enumeration and
subrange types would make sense within the type system.

Normally, we would simply follow the Borland behaviour in TP/Delphi
modes and use the more strict definition in FPC modes. Unfortunately,
that is not possible for this difference. See
https://forum.lazarus.freepascal.org/index.php/topic,45507.msg322059.html#msg322059
for why.

(**) The Default() intrinsic is indeed a nasty in this context. I'm
personally inclined to consider it equivalent to declaring an
uninitialsed global variable of the same type, or to having a field of
that type in a class. I.e., the value has been cleared, but is still
"uninitialised" (similar to how FPC warns about an uninitialised use
when reading a global variable without first writing to it).

The reason is not because that is more convenient to support the
existing FPC behaviour, because I actually started implementing a
"SafeDefault" intrinsic after the previous thread on this topic, and
it's not possible (just like with having defined behaviour for invalid
data). In this case, the issue is with variant records. You can even
have two overlapping fields of enumeration/subrange types that have
mutually exclusive ranges.