[fpc-devel] Unnecessary string copy from Utf8String to AnsiString if destination CP is UTF8

Ondrej Pokorny lazarus at kluug.net
Sun Apr 28 14:10:41 CEST 2019


On 28.04.2019 12:35, Jonas Maebe wrote:
> On 28/04/2019 09:55, Ondrej Pokorny wrote:
>
> It's probably what Delphi does as well. The result is that the 
> refcount of a string after such an assignment is currently always one.

Thanks for the answer. Yes, Delphi does the same. But strings have 
copy-on-write so the refcount value doesn't really matter - unless you 
change the resulting string via a PChar. Btw. changing a PAnsiChar/PChar 
results in a SIGSEGV in FPC but is OK in Delphi. Could you explain this?

program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
   Utf8Str: UTF8String;
   Str: AnsiString;
   P: PAnsiChar;
begin
   DefaultSystemCodePage := 65001;
   Utf8Str := 'hello';
   Str := Utf8Str;
   P := PAnsiChar(Str);
   P[1] := 'x'; // SIGSEGV in FPC, OK in Delphi
   Writeln(Str);     // writes hxllo in Delphi
   Writeln(Utf8Str); // writes hello in Delphi
end.

The documentation doesn't tell anything about it: 
https://www.freepascal.org/docs-html/ref/refsu12.html

If changing a string via a PChar is not allowed in FPC than the argument 
with refcount is not really valid.

Another thing: if you do the assignment directly (UTF8String -> 
AnsiString), you get a refcount of 1 (=a new copy of the string), but if 
you do the assignment via a RawByteString (UTF8String -> RawByteString 
-> AnsiString), you get a refcount of 3 (=the same string):

program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
   Utf8Str: UTF8String;
   RawStr: RawByteString;
   Str: AnsiString;
begin
   DefaultSystemCodePage := 65001;
   Utf8Str := Copy('hello', 1);
   RawStr := Utf8Str;
   Str := RawStr;
   Writeln(PInteger(PByte(Str) - 8)^); // write refcount
end.


> I've had my share for now fighting with people who rely on 
> implementation details (like this is one), so I'd rather not change 
> that unless Delphi does it too (and even then we may get complaints 
> that FPC is not backwards compatible in this respect).

It's funny to see that the holy mantra of the "implementation detail" is 
used once to support a different behavior and the second time to fight it :)


>> See the attached patch.
>
> Your patch will return an empty string if orgcp is different from both 
> cp and CP_NONE.

This is nonsense. I didn't touch the last else-part that is used when 
orgcp is different from both cp and CP_NONE. You can easily check yourself:

program AnsiUtf8;
var
   Utf8Str: UTF8String;
   Str: AnsiString;
begin
   DefaultSystemCodePage := 1250;
   Utf8Str := 'hello';
   Str := Utf8Str;
   Writeln(Str);
end.

Ondrej




More information about the fpc-devel mailing list