[fpc-pascal] Weird string behavior
Bart
bartjunk64 at gmail.com
Fri Jul 22 00:32:36 CEST 2016
On 7/21/16, Santiago A. <svaa at ciberpiula.net> wrote:
> I've come across this issue: When I concatenate two strings in UTF8 they
> are converted to ansi (Win-1252) .
You have declared all string variables as plain "string", which is the
same as AnsiString(CP_ACP). So all string variables have the encoding
of your active codepage.
Declare Utf8StrA and related as Utf8String.
In DisplayBytes do not use "String" as parametertype, since this will
again automatically convert things.
The AnsiToUtf8 is not necessary anymore if done this way:
procedure DisplayBytes(S:RawByteString);
var
i:Integer;
begin
Write(' ');
for i:=1 to length(s) do
write(ord(s[i]),' ');
writeln;
end;
//-----------------------------------
// body
//-----------------------------------
var
AnsiStrA:string;
AnsiStrB:string;
Utf8StrA: utf8string;
Utf8StrB:utf8string;
Utf8StrConcat:utf8string;
begin
AnsiStrA:=' ';
AnsiStrA[1]:=#243; // o acute win-1252
AnsiStrB:='A';
Write('AnsiStrA: ');DisplayBytes(AnsiStrA); // 243
Write('AnsiStrB: ');DisplayBytes(AnsiStrB); // 65
Utf8StrA:=(AnsiStrA); // 195 179
Utf8StrB:=(AnsiStrB); // 65
writeln;
Write('Utf8StrA: ');DisplayBytes(Utf8StrA); // 195 179
Write('Utf8StrB: ');DisplayBytes(Utf8StrB); // 65
Write('Utf8StrA+Utf8StrB: ');DisplayBytes(Utf8StrA+Utf8StrB);
writeln;
Write('Utf8StrA again: ');DisplayBytes(Utf8StrA); // 195 179
Write('Utf8StrB again: ');DisplayBytes(Utf8StrB); // 65
Utf8StrConcat:=Utf8StrA+Utf8StrB;
writeln;
Write('Utf8StrConcat: ');DisplayBytes(Utf8StrConcat);
end.
Bart
More information about the fpc-pascal
mailing list