[fpc-devel] Unnecessary string copy from Utf8String to AnsiString if destination CP is UTF8
Ondrej Pokorny
lazarus at kluug.net
Sun Apr 28 14:10:41 CEST 2019
On 28.04.2019 12:35, Jonas Maebe wrote:
> On 28/04/2019 09:55, Ondrej Pokorny wrote:
>
> It's probably what Delphi does as well. The result is that the
> refcount of a string after such an assignment is currently always one.
Thanks for the answer. Yes, Delphi does the same. But strings have
copy-on-write so the refcount value doesn't really matter - unless you
change the resulting string via a PChar. Btw. changing a PAnsiChar/PChar
results in a SIGSEGV in FPC but is OK in Delphi. Could you explain this?
program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
Utf8Str: UTF8String;
Str: AnsiString;
P: PAnsiChar;
begin
DefaultSystemCodePage := 65001;
Utf8Str := 'hello';
Str := Utf8Str;
P := PAnsiChar(Str);
P[1] := 'x'; // SIGSEGV in FPC, OK in Delphi
Writeln(Str); // writes hxllo in Delphi
Writeln(Utf8Str); // writes hello in Delphi
end.
The documentation doesn't tell anything about it:
https://www.freepascal.org/docs-html/ref/refsu12.html
If changing a string via a PChar is not allowed in FPC than the argument
with refcount is not really valid.
Another thing: if you do the assignment directly (UTF8String ->
AnsiString), you get a refcount of 1 (=a new copy of the string), but if
you do the assignment via a RawByteString (UTF8String -> RawByteString
-> AnsiString), you get a refcount of 3 (=the same string):
program AnsiUtf8;
{$ifdef fpc}{$mode delphi}{$else}{$apptype console}{$endif}
var
Utf8Str: UTF8String;
RawStr: RawByteString;
Str: AnsiString;
begin
DefaultSystemCodePage := 65001;
Utf8Str := Copy('hello', 1);
RawStr := Utf8Str;
Str := RawStr;
Writeln(PInteger(PByte(Str) - 8)^); // write refcount
end.
> I've had my share for now fighting with people who rely on
> implementation details (like this is one), so I'd rather not change
> that unless Delphi does it too (and even then we may get complaints
> that FPC is not backwards compatible in this respect).
It's funny to see that the holy mantra of the "implementation detail" is
used once to support a different behavior and the second time to fight it :)
>> See the attached patch.
>
> Your patch will return an empty string if orgcp is different from both
> cp and CP_NONE.
This is nonsense. I didn't touch the last else-part that is used when
orgcp is different from both cp and CP_NONE. You can easily check yourself:
program AnsiUtf8;
var
Utf8Str: UTF8String;
Str: AnsiString;
begin
DefaultSystemCodePage := 1250;
Utf8Str := 'hello';
Str := Utf8Str;
Writeln(Str);
end.
Ondrej
More information about the fpc-devel
mailing list