[fpc-devel] Const optimization is a serious bug

Thu Jul 7 07:15:54 CEST 2011

I have some observations on the discussion so far. The biggest question 
is what the intended behavior is.

Martin wrote:
> Well, I have pointed out myself,in my mail, that it probably needs
> more documentation. I do not know if it is documented or not.
>
> But it is the answer, I have gotten several times from developers in
> the FPC team. So for all I know it is the intended behaviour. At
> least intended in FPC (and apparently either intended or at least
> implemented in Delphi). So if there is no documentation for it, then
> it would appear a problem of documentation, rather than a bug in the
> compiler (Again all based on the statements I was given)

Thaddy wrote:
> It is a contract between the compiler and the programmer in which it
> is expected that the string will not be modified inside a procedure,
> function or method.

This is the crux of the controversy. I realized this when I was writing 
the original post, but did not mention it explicitly because I thought 
it would come up anyway.

The difference between a feature and a bug is the specifications. Here 
the specifications are the documentation. I have not found any 
documentation in either FPC or Delphi that there is some implicit 
contract whereby the programmer promises not to modify other variables 
which happen to refer to the same instance as a const parameter. Many 
people have repeatedly stated that this is the programmer's fault. If 
there is an implicit agreement with the programmer, then yes I agree 
with these statements and I believe it is not a compiler bug (although 
certainly not good language design).

However I am looking for documentation. Has anyone found anything yet? 
If anyone can find anything I would be pleased as it would settle the 
question. But lacking any documentation, I don't see how anyone should 
know there is an implicit contract. To me, a const parameter means that 
you cannot modify that parameter by pointing it to something else, nor 
(in the case of strings and dynamic arrays) alter the contents by means 
of said parameter. (Although, you can't really "alter" the contents of 
the instance, as copy-on-write simply creates a whole new instance.) 
That's what it means in other languages I've used, and nothing more. I 
don't see it as implying anything else. Furthermore, as many examples 
have shown, the programmer often /cannot/ know whether several variables 
refer to the same instance, since the handling of creating and 
destroying instances, copy-on-write, etc., is handled by the compiler 
and is considered an implementation detail that should be opaque to the 
programmer.

I don't know how many of you have actually looked at the demo I posted, 
but here is the crucial part. The demo program contains this line:

FCurrentDriverName := Edit1.Text;

In this state, the program works perfectly. However, if this line is 
changed as follows:

FCurrentDriverName := Edit1.Text + 'abc';

the program then crashes. IMHO, this is very scary. All you have to do 
is make a tiny, harmless change and suddenly a working program crashes. 
Also in the demo you will notice that the programmer doesn't even call a 
function that takes a const parameter; the problem is caused by setting 
a parameter, and it just so happens that behind the scenes the 
parameter's setter takes a const parameter.

Unless there is some documentation I am unaware of, I don't agree with 
the implicit contract theory. Instances and variables are not the same. 
People confuse them because that is actually exactly the point of the 
Pascal construct: the compiler creates the illusion that each variable 
is an instance. This is why there is copy on write; so you can modify a 
variable and it doesn't modify other variables that (prior to 
modification) referred to the same instance. But again it is only an 
illusion. In the implementation, a variable (or parameter) and an 
instance of an automatic type are not the same, and that is where the 
problem is rooted. The management of these is an opaque implementation 
detail. The programmer cannot be expected to know whether or not the 
compiler has chosen to use the same instance for two 
variables/parameters, and yet that is what the implicit contract theory 
states.

As in C, Java, etc., if I have a const variable, that means it's a const 
variable/parameter; i.e., I can't change it to point to something else. 
It doesn't carry any implications about other variables that may be 
pointing to the same instance. If the "implicit const contract" is 
indeed true, then I agree there is not a compiler bug, just a poorly 
conceived language feature (please note I most certainly am not trying 
to blame anyone for it though).

The best I have found so far, which is still somewhat ambiguous, is

http://docwiki.embarcadero.com/RADStudio/en/Parameters_%28Delphi%29#Constant_Parameters

which says, "A constant (const) parameter is like a local constant or 
read-only variable. Constant parameters are similar to value parameters, 
except that you can't assign a value to a constant parameter within the 
body of a procedure or function, nor can you pass one as a var parameter 
to another routine."

Although I acknowledge it is a bit vague, this seems to me to be leaning 
heavily in the direction that there is no implicit contract. As it says, 
just think about how a local const variable would behave. Additionally, 
note that it says you can't assign a value /within the body/ of the 
routine. There is no constraint mentioned about modifying other 
variables in other routines. However I acknowledge that the absence of a 
statement is not proof.

So, the best evidence I have against the implicit contract theory is:
1. By analogy with all the other languages I'm familiar with (C, C++, Java)
2. The fact that if the theory were true it would be a terrible language 
idea
3. The fact that the Embarcadero documentation likens a const parameter 
to a local const or const variable.
4. Since using a single instance for multiple variables, copy-on-write, 
and reference counting are supposed to be opaque implementation details 
that provide the illusion that every variable is its own unique 
instance, it does not make any sense to believe the programmer can know 
when the compiler has chosen to share an instance with other variables, 
such that the programmer can determine whether or not const is safe in a 
particular situation.

Again, if there is prior documentation to the contrary, I will gladly be 
proven wrong, and we can move on to worry about practical steps to 
mitigate the possibility of problems rather than whether it is a bug. 
However if there is no such specification, then really what we have here 
is an undefined aspect of the language. In that case, seeing that it 
needs clarification, I would propose that a sane language design 
decision be made by rejecting the implicit contract concept.

Michael wrote:
> You can always fool the compiler. The compiler trusts you and assumes
> that what you tell her is true...

Yes, of course you can always fool the compiler, it just shouldn't be
the other way around. The example you gave is very different for one
very important reason: you show using explicit allocating and freeing of
an object. With strings, the programmer does not, and cannot, explicitly
allocate or deallocate the resources, and the problem lies in the
behavior of the automatic allocation and deallocation. Thus there is
nothing in common between this example and the problem at hand.

Alexander wrote:
> So far, we have found two instances of procedures with non-const
> string parameters with the comment to the effect of "do not change to
> const, or the code will break". One of these instances is fairly
> recent, and the developer who made the change did remember that it
> cost him more than a work-day to fix -- and that was after the crash
>  report from the client. The company is now considering outright ban
>  on all const string parameters,

Wimpie wrote:
> If it helps, I can remember fixing this bug. It took two days...
...
> So yes, it is very scary to me and would like it to be fixed.

Thank you for providing some corroborating evidence that there is in
fact real-world danger here and that it is worth the time to consider.

Florian wrote:
> It affects more types, even shortstring suffers from it

I must respectfully disagree. In the case of shortstring, the value of 
the const parameter does get modified, but that is to be expected. If my 
understanding is correct (and I'm open to be corrected), the semantics 
of ShortString are different. With AnsiString, assigning one string 
variable to another is supposed to create the illusion that they are 
unique instances. Hence there is copy-on-write. With short strings, 
assigning one to another literally means they are the same instance. 
Again this comes back to the difference between instance and variable, 
and the illusion implicit in AnsiString and dynamic arrays, which I 
think is not the case with ShortString (but again I could be wrong).

The problem at issue here is the fact that the compiler can actually 
free memory prematurely. In the case of shortstring, it won't crash. The 
programmer may have tricked himself, but the compiler isn't doing 
anything unexpected. In the case of reference counted types (strings, 
dynamic arrays, and interfaces), the compiler is doing something quite 
unpredictable. Again, however, this depends on the 
specification/documentation. If there is in fact a constraint the 
programmer must follow of not modifying other variables that by chance 
point to the same instance, then I would agree there is no problem with 
ref counted types either and that the programmer has made an error. I 
just question that assumption.

Alexander wrote:
> The documentation should recommend users to never use "const
> string" parameters
> except cases where they can prove it to be safe.
> Moreover, since the safety of such code may be compromised by
> completely unrelated changes
> (e.g. by adding {$INLINE OFF} for debugging purposes),
> "const" modifier should not be used at all except the most
> performance-critical code.

If the compiler is actually working correctly and there is an unstated 
(perhaps even undocumented) constraint on the programmer to not modify 
other variables that use the same instance as a const parameter (even 
though the programmer cannot know that), then I fully agree with this 
advice.

In fact, with the implicit contract theory, I propose that it would 
*never* safe to use const, because the programmer cannot control the 
implementation of the ref counted types. It is an explicit part of the 
design that the programmer should not, and indeed cannot, be aware of 
when/where the compiler allocates, deallocates, or copies instances of 
any particular ref counted type. Therefore, semantically there can never 
be a guarantee that using const is safe, if the implicit contract is 
accepted. I stress, instances and variables are not the same. The 
relationship between them is supposed to be an opaque implementation 
detail. That is part of the reason I do not believe in this implicit 
contract about handling consts.

Now I will acknowledge that const can be used in certain limited 
situations without harm. I still stand by the claim that it is 
semantically incorrect to ever use const, and I say this on the basis 
that if const safety depends on the automatic type implementation and 
the automatic type implementation is opaque, it is theoretically wrong 
to believe const can ever be used. However in practical terms we do know 
that passing a string which is *created* as a local variable in a 
routine cannot do harm because it will always have at least 1 refcount.

To summarize:

1. To the programmer, each AnsiString and dynamic array variable is 
supposed to be unique, i.e., after doing A := B, modifying A does not 
affect B and vice versa.

2. The fact that multiple variables (or parameters) actually can refer 
to a single instance is an implementation optimization. That is of no 
concern to the programmer. The optimization is implemented by the 
combination of reference counting and copy-on-write.

3. The programmer cannot be aware of what the compiler decides to do 
regarding how it implements reference counting and copy-on-write. The 
programmer should simply know that unique variables are unique instances 
for all practical purposes (except var parameters obviously, since that 
is the whole point of having to declare it var).

4. If the programmer cannot be aware of when an instance is shared with 
multiple variables, an implicit contract that the programmer cannot 
modify other variables which by chance are using the same instance is 
impossible to obey, and therefore const would be useless.

As further support for #1, consider: With a class, you do not have to 
pass it to a procedure as var, but when you modify the instance, you 
modify the same instance as was passed to the procedure. That is how 
classes work. Having a variable point to the same object as another 
variable does in fact mean it's the same instance, the same object. That 
is Object Pascal's object model. But with strings, if you want to modify 
the string in a procedure and have that affect the argument initially 
passed to the procedure you *have* to use var. That alone should be 
convincing evidence that the programmer is always supposed to be able to 
assume that unique variables are unique instances for automatically 
managed strings and arrays.

If I may offer one other comment, I think it would be best to reserve 
comments on TStrings/TStringList and other libraries to a separate 
conversation and focus on the basic behavior of the compiler or else the 
conversation will get too confusing.