[fpc-devel] Dangerous optimization in CASE..OF

Jonas Maebe jonas at freepascal.org
Mon Jul 17 23:50:27 CEST 2017


Martok wrote:
> I would think that
> type
>   TEnum = (a,b,c);
>   TSubEnum = a..c;
>
> should have the same semantics, but at the same time they can't if subranges are
> strict and enums are not. I see now where you're coming from.
> (I'll get back to that example at the end.)
>
> And then there's bitpacked records...

Indeed, and range checking. I mean, if you have the above declarations,
then what should be the behaviour in the following cases:

1)

{$r+}
var
  a,b: TEnum;
begin
  a:=tenum(5);
  b:=a;
end;

2)

{$r+}
type
  tr = bitpacked record
    f: TEnum;
  end;
var
  a: tenum;
  r: tr;
begin
  a:=tenum(5);
  t.f:=a;
end;

3)

(does this trigger a range check error in Delphi?)

{$r+}
var
  arr: array[tenum] of byte;;
  a: tenum;
begin
  a:=tenum(5);
  arr[a]:=1;
end;

(and then the same with tenum replaced by tsubenum)

Should these silently truncate the values, trigger range errors (i.e.,
any conversion/assignment from an enum type to itself should insert a
range check, rather than only when converting between different types),
or just copy the data (which means that in case 2, enums cannot actually
be bitpacked)?

Defining different behaviour depending on the expression is the road to
madness. After all, in all of these cases, the expression that may give
rise to the range error/truncation is identical: a "conversion" from a
tenum value to the tenum type itsef.

> Getting back to the terms Ondrej introduced yesterday, I think that "normal"
> enums may or may not be High-Level enumerations, but enums with explicit
> assigment can *only* be Low-Level enumerations. Can we safely distinguish them
> in the compiler?

Yes.

> Does it even make sense to add that complexity?

I'm not sure.

> But for subranges, they write:
>
> """incrementing or decrementing past the boundary of a subrange simply converts
> the value to the base type."""
> So we can also leave the min..max range and transparently drop to the parent
> type. This raises in $R+, _but is valid otherwise_. (* This is the exact same
> text as in the TP5 langref *)

I guess it's a bit like how with {$Q+} you get overflow errors, and with
{$Q-} you have guaranteed 2's-complement logic (at least on a CPU that
uses 2's complement). On the other hand, it makes subranges completely
useless, unless their declared range results in the compiler allocating
an exact multiple of one memory storage unit (bytes, in our case).

Well, unless of course consider having base types that are not a
multiple of 8 bits (I don't see any definition of what can constitute a
base type on the Delphi page you linked). Then you would also have to
add overflow checking for non-multiple-of-byte-sized types. And in this
case, you would still need to support out-of-range values up to whatever
fits in the number of bits reserved for said base type, but at least it
would make bitpacking possible. OTOH, in terms of safety or simplicity
of implementation, little or nothing would be gained.

> My initial proposed trivial solution was to keep this undefined (maybe document
> the difference to BP), and simply change codegen to be undefined-safe normally
> and only undefined-unsafe in -O4. I am, however, no longer so sure if that is
> really a good solution.

Undefined is never safe. Undefined is something at the semantic level,
which pervades the entire language and compiler. Optimisations merely
perform additional transformations that also honour those semantics. You
cannot say "this is undefined, but safe at -O0". Even if only because it
may no longer be safe even at -O0 the next year, after adding support
for GIMPL or LLVM output. Even more likely it may happen because in
general, check conditions (such as "does this need a range check") and
implementations (such as jump tables, that are basically just loading
array entries and then jumping to the address) tend to get factored out
over time.

Either is something is defined and fully supported, or it's not.
Something in between cannot exist in any sane programming language nor
in a sane implementation of a compiler implementing said language. Of
course, many programmers like to believe that is in fact possible and
write their programs based on how one (often single version of) compiler
compiles it. And then you indeed get rants on LKML about clang and LLVM
and newer gcc versions, while their code was broken all along.

> There has to be a reason why everybody else chose Low-Level enums, except that
> it is far simpler to implement, right?

I don't know, but I still don't understand why on Earth you would want
them in a strongly typed language.


Jonas



More information about the fpc-devel mailing list