[fpc-pascal] PChar & AnsiString

Michael Van Canneyt michael at freepascal.org
Tue Jun 1 13:05:16 CEST 2010



On Tue, 1 Jun 2010, spir ☣ wrote:

> Hello,
>
>
> The documentation in the ref manual about PChar may have i bit more details: http://www.freepascal.org/docs-html/ref/refsu13.html#x36-390003.2.7
>
> Do the following statements hold true?
> * This type is mainly intended to interface with C code (or for low-level needs?). Else AnsiString should be prefered (even for low-level, since AnsiString is also referenced via pointer?).

PChar is for C code.
> * Like C strings, and unlike AnsiString-s (even if the latter also are "pointed"), PChar strings cannot hold NULL characters (#0). I just checked this point.

Correct.

>
> Also:
> * How is length computed (traversal?)?

Strlen traverses till the first null.


>
>
>
> Can AnsiStrings be safely used as dynamic byte arrays? For instance to benefit of ref counting and copy_on_write (if any benefit). Or is it recommended to use Array of Byte?

You can use them.

>
> What is the actual benefit of copy-on-write? I ask because of the following reasoning:

Copy on write is needed to preserve the Pascal nature of strings while
keeping the benefits of reference counted strings.

After

A:='some string'; // Ref count is 1
B:=A;  // Ref count is 2
B[1]:='S'; // Copy, and ref count of B is 1.

the A[1]='s' should still hold true.



> * If a string is just used at several places, for example in output or into bigger strings, then there is no reason reason to copy it into a new variable.



> * If a programmer explicitely assigns an existing string to a new variable, the intent is precisely copy-semantics, to make them independent for further changes. If there is no change, there is also no reason for such an assignment.

This is not correct. Many strings are simply referenced several times.

> As a consequence, s2:=s1 will nearly always be followed by modification of either string, which will result on copy anyway, according to copy-on-write semantics. So, the initial gain at assignment time is soon lost. While the cost I imagine in terms of type complexity remains (every builtin modification method must ensure copying; no user-defined modification method should be possible without using builtin ones -- else copy-on-write is lost and consequences undefined).
>
> What happens if a programmer indirectly modifies an AnsiString (via a pointer) which ref count is > 1:
>
> Var
>    s1,s2 : AnsiString;
>    pc    : PChar;
> begin
>    s1 := 'abcde' ; s2 := s1;
>    pc := PChar(s1);
>    pc[2] := 'X';
>    writeln(pc,' ',s1,' ',s2);	// abXde abXde abXde
> end.
>
> There must be an error in my reasoning, else language designers would not bother with such complication. What do you think?

If the programmer does this, it is his own fault; A pchar typecast should be
considered read-only and valid only for the duration of the expression. 
It states as much in the docs.

Michael.


More information about the fpc-pascal mailing list