[fpc-devel] Const optimization is a serious bug
Chad Berchek
ad100 at vobarian.com
Thu Jul 7 07:15:54 CEST 2011
I have some observations on the discussion so far. The biggest question
is what the intended behavior is.
Martin wrote:
> Well, I have pointed out myself,in my mail, that it probably needs
> more documentation. I do not know if it is documented or not.
>
> But it is the answer, I have gotten several times from developers in
> the FPC team. So for all I know it is the intended behaviour. At
> least intended in FPC (and apparently either intended or at least
> implemented in Delphi). So if there is no documentation for it, then
> it would appear a problem of documentation, rather than a bug in the
> compiler (Again all based on the statements I was given)
Thaddy wrote:
> It is a contract between the compiler and the programmer in which it
> is expected that the string will not be modified inside a procedure,
> function or method.
This is the crux of the controversy. I realized this when I was writing
the original post, but did not mention it explicitly because I thought
it would come up anyway.
The difference between a feature and a bug is the specifications. Here
the specifications are the documentation. I have not found any
documentation in either FPC or Delphi that there is some implicit
contract whereby the programmer promises not to modify other variables
which happen to refer to the same instance as a const parameter. Many
people have repeatedly stated that this is the programmer's fault. If
there is an implicit agreement with the programmer, then yes I agree
with these statements and I believe it is not a compiler bug (although
certainly not good language design).
However I am looking for documentation. Has anyone found anything yet?
If anyone can find anything I would be pleased as it would settle the
question. But lacking any documentation, I don't see how anyone should
know there is an implicit contract. To me, a const parameter means that
you cannot modify that parameter by pointing it to something else, nor
(in the case of strings and dynamic arrays) alter the contents by means
of said parameter. (Although, you can't really "alter" the contents of
the instance, as copy-on-write simply creates a whole new instance.)
That's what it means in other languages I've used, and nothing more. I
don't see it as implying anything else. Furthermore, as many examples
have shown, the programmer often /cannot/ know whether several variables
refer to the same instance, since the handling of creating and
destroying instances, copy-on-write, etc., is handled by the compiler
and is considered an implementation detail that should be opaque to the
programmer.
I don't know how many of you have actually looked at the demo I posted,
but here is the crucial part. The demo program contains this line:
FCurrentDriverName := Edit1.Text;
In this state, the program works perfectly. However, if this line is
changed as follows:
FCurrentDriverName := Edit1.Text + 'abc';
the program then crashes. IMHO, this is very scary. All you have to do
is make a tiny, harmless change and suddenly a working program crashes.
Also in the demo you will notice that the programmer doesn't even call a
function that takes a const parameter; the problem is caused by setting
a parameter, and it just so happens that behind the scenes the
parameter's setter takes a const parameter.
Unless there is some documentation I am unaware of, I don't agree with
the implicit contract theory. Instances and variables are not the same.
People confuse them because that is actually exactly the point of the
Pascal construct: the compiler creates the illusion that each variable
is an instance. This is why there is copy on write; so you can modify a
variable and it doesn't modify other variables that (prior to
modification) referred to the same instance. But again it is only an
illusion. In the implementation, a variable (or parameter) and an
instance of an automatic type are not the same, and that is where the
problem is rooted. The management of these is an opaque implementation
detail. The programmer cannot be expected to know whether or not the
compiler has chosen to use the same instance for two
variables/parameters, and yet that is what the implicit contract theory
states.
As in C, Java, etc., if I have a const variable, that means it's a const
variable/parameter; i.e., I can't change it to point to something else.
It doesn't carry any implications about other variables that may be
pointing to the same instance. If the "implicit const contract" is
indeed true, then I agree there is not a compiler bug, just a poorly
conceived language feature (please note I most certainly am not trying
to blame anyone for it though).
The best I have found so far, which is still somewhat ambiguous, is
http://docwiki.embarcadero.com/RADStudio/en/Parameters_%28Delphi%29#Constant_Parameters
which says, "A constant (const) parameter is like a local constant or
read-only variable. Constant parameters are similar to value parameters,
except that you can't assign a value to a constant parameter within the
body of a procedure or function, nor can you pass one as a var parameter
to another routine."
Although I acknowledge it is a bit vague, this seems to me to be leaning
heavily in the direction that there is no implicit contract. As it says,
just think about how a local const variable would behave. Additionally,
note that it says you can't assign a value /within the body/ of the
routine. There is no constraint mentioned about modifying other
variables in other routines. However I acknowledge that the absence of a
statement is not proof.
So, the best evidence I have against the implicit contract theory is:
1. By analogy with all the other languages I'm familiar with (C, C++, Java)
2. The fact that if the theory were true it would be a terrible language
idea
3. The fact that the Embarcadero documentation likens a const parameter
to a local const or const variable.
4. Since using a single instance for multiple variables, copy-on-write,
and reference counting are supposed to be opaque implementation details
that provide the illusion that every variable is its own unique
instance, it does not make any sense to believe the programmer can know
when the compiler has chosen to share an instance with other variables,
such that the programmer can determine whether or not const is safe in a
particular situation.
Again, if there is prior documentation to the contrary, I will gladly be
proven wrong, and we can move on to worry about practical steps to
mitigate the possibility of problems rather than whether it is a bug.
However if there is no such specification, then really what we have here
is an undefined aspect of the language. In that case, seeing that it
needs clarification, I would propose that a sane language design
decision be made by rejecting the implicit contract concept.
Michael wrote:
> You can always fool the compiler. The compiler trusts you and assumes
> that what you tell her is true...
Yes, of course you can always fool the compiler, it just shouldn't be
the other way around. The example you gave is very different for one
very important reason: you show using explicit allocating and freeing of
an object. With strings, the programmer does not, and cannot, explicitly
allocate or deallocate the resources, and the problem lies in the
behavior of the automatic allocation and deallocation. Thus there is
nothing in common between this example and the problem at hand.
Alexander wrote:
> So far, we have found two instances of procedures with non-const
> string parameters with the comment to the effect of "do not change to
> const, or the code will break". One of these instances is fairly
> recent, and the developer who made the change did remember that it
> cost him more than a work-day to fix -- and that was after the crash
> report from the client. The company is now considering outright ban
> on all const string parameters,
Wimpie wrote:
> If it helps, I can remember fixing this bug. It took two days...
...
> So yes, it is very scary to me and would like it to be fixed.
Thank you for providing some corroborating evidence that there is in
fact real-world danger here and that it is worth the time to consider.
Florian wrote:
> It affects more types, even shortstring suffers from it
I must respectfully disagree. In the case of shortstring, the value of
the const parameter does get modified, but that is to be expected. If my
understanding is correct (and I'm open to be corrected), the semantics
of ShortString are different. With AnsiString, assigning one string
variable to another is supposed to create the illusion that they are
unique instances. Hence there is copy-on-write. With short strings,
assigning one to another literally means they are the same instance.
Again this comes back to the difference between instance and variable,
and the illusion implicit in AnsiString and dynamic arrays, which I
think is not the case with ShortString (but again I could be wrong).
The problem at issue here is the fact that the compiler can actually
free memory prematurely. In the case of shortstring, it won't crash. The
programmer may have tricked himself, but the compiler isn't doing
anything unexpected. In the case of reference counted types (strings,
dynamic arrays, and interfaces), the compiler is doing something quite
unpredictable. Again, however, this depends on the
specification/documentation. If there is in fact a constraint the
programmer must follow of not modifying other variables that by chance
point to the same instance, then I would agree there is no problem with
ref counted types either and that the programmer has made an error. I
just question that assumption.
Alexander wrote:
> The documentation should recommend users to never use "const
> string" parameters
> except cases where they can prove it to be safe.
> Moreover, since the safety of such code may be compromised by
> completely unrelated changes
> (e.g. by adding {$INLINE OFF} for debugging purposes),
> "const" modifier should not be used at all except the most
> performance-critical code.
If the compiler is actually working correctly and there is an unstated
(perhaps even undocumented) constraint on the programmer to not modify
other variables that use the same instance as a const parameter (even
though the programmer cannot know that), then I fully agree with this
advice.
In fact, with the implicit contract theory, I propose that it would
*never* safe to use const, because the programmer cannot control the
implementation of the ref counted types. It is an explicit part of the
design that the programmer should not, and indeed cannot, be aware of
when/where the compiler allocates, deallocates, or copies instances of
any particular ref counted type. Therefore, semantically there can never
be a guarantee that using const is safe, if the implicit contract is
accepted. I stress, instances and variables are not the same. The
relationship between them is supposed to be an opaque implementation
detail. That is part of the reason I do not believe in this implicit
contract about handling consts.
Now I will acknowledge that const can be used in certain limited
situations without harm. I still stand by the claim that it is
semantically incorrect to ever use const, and I say this on the basis
that if const safety depends on the automatic type implementation and
the automatic type implementation is opaque, it is theoretically wrong
to believe const can ever be used. However in practical terms we do know
that passing a string which is *created* as a local variable in a
routine cannot do harm because it will always have at least 1 refcount.
To summarize:
1. To the programmer, each AnsiString and dynamic array variable is
supposed to be unique, i.e., after doing A := B, modifying A does not
affect B and vice versa.
2. The fact that multiple variables (or parameters) actually can refer
to a single instance is an implementation optimization. That is of no
concern to the programmer. The optimization is implemented by the
combination of reference counting and copy-on-write.
3. The programmer cannot be aware of what the compiler decides to do
regarding how it implements reference counting and copy-on-write. The
programmer should simply know that unique variables are unique instances
for all practical purposes (except var parameters obviously, since that
is the whole point of having to declare it var).
4. If the programmer cannot be aware of when an instance is shared with
multiple variables, an implicit contract that the programmer cannot
modify other variables which by chance are using the same instance is
impossible to obey, and therefore const would be useless.
As further support for #1, consider: With a class, you do not have to
pass it to a procedure as var, but when you modify the instance, you
modify the same instance as was passed to the procedure. That is how
classes work. Having a variable point to the same object as another
variable does in fact mean it's the same instance, the same object. That
is Object Pascal's object model. But with strings, if you want to modify
the string in a procedure and have that affect the argument initially
passed to the procedure you *have* to use var. That alone should be
convincing evidence that the programmer is always supposed to be able to
assume that unique variables are unique instances for automatically
managed strings and arrays.
If I may offer one other comment, I think it would be best to reserve
comments on TStrings/TStringList and other libraries to a separate
conversation and focus on the basic behavior of the compiler or else the
conversation will get too confusing.
More information about the fpc-devel
mailing list