[fpc-pascal] PChar & AnsiString

spir ☣ denis.spir at gmail.com
Tue Jun 1 12:23:29 CEST 2010


Hello,


The documentation in the ref manual about PChar may have i bit more details: http://www.freepascal.org/docs-html/ref/refsu13.html#x36-390003.2.7

Do the following statements hold true?
* This type is mainly intended to interface with C code (or for low-level needs?). Else AnsiString should be prefered (even for low-level, since AnsiString is also referenced via pointer?).
* Like C strings, and unlike AnsiString-s (even if the latter also are "pointed"), PChar strings cannot hold NULL characters (#0). I just checked this point.

Also:
* How is length computed (traversal?)?



Can AnsiStrings be safely used as dynamic byte arrays? For instance to benefit of ref counting and copy_on_write (if any benefit). Or is it recommended to use Array of Byte?

What is the actual benefit of copy-on-write? I ask because of the following reasoning:
* If a string is just used at several places, for example in output or into bigger strings, then there is no reason reason to copy it into a new variable.
* If a programmer explicitely assigns an existing string to a new variable, the intent is precisely copy-semantics, to make them independent for further changes. If there is no change, there is also no reason for such an assignment.
As a consequence, s2:=s1 will nearly always be followed by modification of either string, which will result on copy anyway, according to copy-on-write semantics. So, the initial gain at assignment time is soon lost. While the cost I imagine in terms of type complexity remains (every builtin modification method must ensure copying; no user-defined modification method should be possible without using builtin ones -- else copy-on-write is lost and consequences undefined).

What happens if a programmer indirectly modifies an AnsiString (via a pointer) which ref count is > 1:

Var
    s1,s2 : AnsiString;  
    pc    : PChar;  
begin
    s1 := 'abcde' ; s2 := s1;
    pc := PChar(s1);
    pc[2] := 'X';
    writeln(pc,' ',s1,' ',s2);	// abXde abXde abXde
end.

There must be an error in my reasoning, else language designers would not bother with such complication. What do you think?



Denis
________________________________

vit esse estrany ☣

spir.wikidot.com



More information about the fpc-pascal mailing list