[fpc-devel] Const optimization is a serious bug

Chad Berchek ad100 at vobarian.com
Sun Jul 10 19:38:42 CEST 2011


Some thoughts on the meaning of const, constref, and "constval", and how
they can usefully be applied:

My initial understanding of const was hazy. I have come to appreciate
that it is defined, but only in a very loose way. Instead of undefined I
should say ill-defined. I'm not entirely sure what some of the attacks
have been about, but I will acknowledge that I was not 100% sure about
const when I started. What I have said about it being unclear is
certainly true though. Let me explain how, and what I think a better
idea would be.

I wrote:
>> If we *knew* that const meant it would be by reference, then that
>> immediately eliminates the confusion in the case of ShortString
>> and records; modifying one instance through several references
>> affects them all, as expected. What the programmer says happens;
>> there can be no "bug", except in the programmer's own knowledge.

Martin wrote:
> We do know:
> http://www.freepascal.org/docs-html/ref/refsu58.html#x135-14500011.4.4
>A constant argument is passed by reference if its size is larger than
> a pointer. It is passed by value if the size is equal or is less then
> the size of a native pointer.

Jonas wrote:
>>> A constant argument is passed by reference if its size is larger
>>>  than a pointer. It is passed by value if the size is equal or is
>>>  less then the size of a native pointer.
> That part of the manual is wrong. What const does is by design
> completely implementation-dependent

I should not say const is undefined. Actually I mean const is ambiguous;
sort of defined as undefined, in that: 1) with the current definition,
it changes by
platform, 2) the current definition is wrong anyway, and 3) it is
implementation dependent, which I interpret to mean it could change in
the future.

If the meaning of const is "entirely implementation-dependent", and if I
interpret that correctly, I think we really cannot assume anything about
what exactly will happen with regard to the calling convention, which is
what I was worried about earlier.

Now, constref was intended for compatibility with other languages, but I
think it is a valuable addition to Pascal in its own right and that it
would be wise to use.

The hypothetical constval modifier is a natural corollary to this. In
the current implementation, if you were to pass a record to a procedure
for example, you have two choices: 1) pass it by value but have it be
non-const, 2) pass it as const, which might be by reference or value.
With constval and constref you would have the other two highly useful
choices: 1) pass by reference and it's const, 2) pass by value and it's
const. Without making assumptions about const, which I don't think we
can, these two options do not currently exist, so I think these new
keywords are a useful addition.

Some might say that it doesn't matter whether it is passed by value or
reference: if you pass it as const, either way you promise not to modify
it. See, that is my concern: what is IT? People have used other words,
but it ultimately comes down to: if you say the programmer promises not
to modify the thing passed as const, what thing *exactly* is that? A
variable, reference, or instance? With const, we don't really know. With
constref, it means you promise not to modify the memory location
(instance) pointed to by that reference. With constval it means
that you can change the memory location/instance that was
passed into the procedure, since the procedure is now using it's own
copy anyway; you just can't modify the instance that the procedure now
has it's own copy of.

One additional problem does arise. Constval implies that the
implementation is pass-by-value. However in many (I'd say most) cases it
is quite possible that we could be interested only in the semantics of
pass-by-value, not the implementation. So for AnsiStrings, neither
constref nor constval would be suitable. Constref would mean it must be
by reference, but we want by-value semantics. Constval would mean that
the string has to be copied to a separate memory location, which we
might not really care about, and is slow and wasteful.

So, I propose: don't have constval literally mean it is passed by value,
i.e.,
pushing the string onto the stack or copying it to a new memory
location. Instead, have constval defined as by-value *semantics*. In other
words, constval would indicate the meaning of the language, not the
implementation. It would not be a calling convention. It would not mean
the call would be by value; it would mean the semantics would be by
value. This is essentially what I originally thought const would mean
with AnsiStrings, though it turned out hazy.

The programmer must know the language and the compiler must implement the
language. How the compiler does that should not determine how the
program behaves. If constval is defined in semantics, not 
implementation, this means the compiler can still take whatever 
optimizations are possible as long as they do not break those semantics. 
So you would not have to actually copy strings, you'd just have to be 
sure the semantics are valid. In the current implementation, this would 
simply mean eliminating the "const optimization" to which I originally 
objected.

See regarding const:
http://www.freepascal.org/docs-html/ref/refsu58.html
> The main use for this [const] is reducing the stack size, hence
> improving performance, and still retaining the semantics of passing
> by value...

(Thanks for the link Alexander!)
This is undoubtedly the most conclusive statement I've seen regarding
this issue. If this documentation were correct, my initial claim that
there is a bug would be correct. However I think we're long beyond that.
Instead it seems to be consensus that the documentation is incorrect and
the implementation will not change. In that case there is still a need
for the functionality that is suggested in that statement.

Up to this point, my observations and proposals:

1. There has been a grand conflation of implementation and semantics.

2. Const is implemented correctly right now. Although defined, the 
problem is that it is
ill-defined: you can't be quite sure what it's going to do. If you're in
a situation where you don't need to know whether it's byval or byref
then it's OK, but most of the time that's not good enough. Changing the 
implementation of const is too risky anyway.

3. Constval should be created and it should represent semantics, not 
implementation. In other
words, it is not a calling convention. It does not stipulate how
parameters would be passed. It is a construct of the language. As long
as the compiler preserves its meaning, it's OK. In other words, 
pass-by-value
semantics, not pass-by-value implementation. When it comes to 
implementation, which I stress is a separate issue, it would really come 
down to removing the const optimization as I originally proposed.

4. Constref is already a calling convention, an implementation.
Fortunately, it automatically carries the requisite semantics with that
implementation, so it's OK as is.

I think Alexander's suggestion is the best I've heard:

> Slowly migrate existing code to either "constref" or "constval", use
>  "const" for legacy/compatibility/extreme optimization cases.

And to summarize the discussion:

1. I initially believed const AnsiString parameters had by-value semantics.
2. The quote in the FPC docs supported this.
3. However it seems that most people agree that those docs are wrong.
4. Most of the discussion has freely mixed semantics and implementation
5. The more reading I have done and the more examples people have given 
of the behavior of records and shortstrings, the more I have realized 
that you can't really make many assertions about how const will behave. 
It's more complicated and convoluted than I thought.
6. Const is defined. If you don't care about byval or byref semantics, 
then it is sufficient. For most cases that's not good enough, in my 
opinion, and comparison with many other languages will show that other 
people have realized this long ago.
7. Constref is a great addition. Although it is defined in terms of its 
implementation, that implementation automatically has the necessary 
semantics, so it is good as-is.
8. Constval, I propose, should be defined as pass-by-value semantics. 
This means the implementation could still be pass-by-reference with 
reference counting and copy-on-write. It would just mean the compiler is 
not free to violate pass-by-value semantics as it currently is permitted 
to do.



More information about the fpc-devel mailing list