[fpc-pascal] Weird string behavior

Bart bartjunk64 at gmail.com
Fri Jul 22 00:32:36 CEST 2016


On 7/21/16, Santiago A. <svaa at ciberpiula.net> wrote:

> I've come across this issue: When I concatenate two strings in UTF8 they
> are converted to ansi (Win-1252) .

You have declared all string variables as plain "string", which is the
same as AnsiString(CP_ACP). So all string variables have the encoding
of your active codepage.

Declare Utf8StrA and related as Utf8String.
In DisplayBytes do not use "String" as parametertype, since this will
again automatically convert things.
The AnsiToUtf8 is not necessary anymore if done this way:

procedure DisplayBytes(S:RawByteString);
var
  i:Integer;
begin
  Write('  ');
  for i:=1 to length(s) do
    write(ord(s[i]),' ');
  writeln;
end;

//-----------------------------------
// body
//-----------------------------------
var
  AnsiStrA:string;
  AnsiStrB:string;
  Utf8StrA: utf8string;
  Utf8StrB:utf8string;
  Utf8StrConcat:utf8string;
begin
  AnsiStrA:=' ';
  AnsiStrA[1]:=#243; // o acute win-1252
  AnsiStrB:='A';

  Write('AnsiStrA: ');DisplayBytes(AnsiStrA); // 243
  Write('AnsiStrB: ');DisplayBytes(AnsiStrB); // 65


  Utf8StrA:=(AnsiStrA); // 195 179
  Utf8StrB:=(AnsiStrB); // 65

  writeln;
  Write('Utf8StrA: ');DisplayBytes(Utf8StrA); // 195 179
  Write('Utf8StrB: ');DisplayBytes(Utf8StrB); // 65

  Write('Utf8StrA+Utf8StrB: ');DisplayBytes(Utf8StrA+Utf8StrB);

  writeln;
  Write('Utf8StrA again: ');DisplayBytes(Utf8StrA); // 195 179
  Write('Utf8StrB again: ');DisplayBytes(Utf8StrB); // 65


  Utf8StrConcat:=Utf8StrA+Utf8StrB;
  writeln;
  Write('Utf8StrConcat: ');DisplayBytes(Utf8StrConcat);
end.

Bart



More information about the fpc-pascal mailing list