[fpc-pascal] Weird string behavior

Mattias Gaertner nc-gaertnma at netcologne.de
Tue Jul 26 11:43:00 CEST 2016


On Tue, 26 Jul 2016 11:01:28 +0200
Jonas Maebe <jonas.maebe at elis.ugent.be> wrote:

>[...]
> Could you try the same program with u1 as plain ansistring instead of  
> utf8string? (with an additional  
> "setcodepage(rawbytestring(u1),65001,false);" after assigning u1)

Sure:

{$APPTYPE CONSOLE}

type
   tcp866 = type ansistring(866);

var
   s1, s2: tcp866;
   u1: UTF8String;
   r1: RawByteString;
   a1, a2: AnsiString;
begin
   s1:='cp866';
   setcodepage(rawbytestring(s1),65001,false);
   Writeln('s1 = "', s1, '" cp = ', StringCodePage(s1));
   a1:='acp';
   setcodepage(rawbytestring(a1),65001,false);
   Writeln('a1 = "', a1, '" cp = ', StringCodePage(a1));
   u1:='utf8';
   Writeln('u1 = "', u1, '" cp = ', StringCodePage(u1));

   s2:=s1+u1;
   Writeln('s2:=s1+u1 = "', s2, '" cp = ', StringCodePage(s2));
   s2:=u1+s1;
   Writeln('s2:=u1+s1 = "', s2, '" cp = ', StringCodePage(s2));

   r1:=s1+u1;
   Writeln('r1:=s1+u1 = "', r1, '" cp = ', StringCodePage(r1));
   r1:=u1+s1;
   Writeln('r1:=u1+s1 = "', r1, '" cp = ', StringCodePage(r1));

   a2:=s1+u1;
   Writeln('a2:=s1+u1 = "', a2, '" cp = ', StringCodePage(a2));
   a2:=u1+s1;
   Writeln('a2:=u1+s1 = "', a2, '" cp = ', StringCodePage(a2));

   s2:=s1+a1;
   Writeln('s2:=s1+a1 = "', s2, '" cp = ', StringCodePage(s2));
   s2:=a1+s1;
   Writeln('s2:=a1+s1 = "', s2, '" cp = ', StringCodePage(s2));

   r1:=s1+a1;
   Writeln('r1:=s1+a1 = "', r1, '" cp = ', StringCodePage(r1));
   r1:=a1+s1;
   Writeln('r1:=a1+s1 = "', r1, '" cp = ', StringCodePage(r1));

   a2:=s1+a1;
   Writeln('a2:=s1+a1 = "', a2, '" cp = ', StringCodePage(a2));
   a2:=a1+s1;
   Writeln('a2:=a1+s1 = "', a2, '" cp = ', StringCodePage(a2));

   readln;
end.


s1 = "cp866" cp = 65001
a1 = "acp" cp = 65001
u1 = "utf8" cp = 65001
s2:=s1+u1 = "cp866utf8" cp = 866
s2:=u1+s1 = "utf8cp866" cp = 866
r1:=s1+u1 = "cp866utf8" cp = 1252
r1:=u1+s1 = "utf8cp866" cp = 1252
a2:=s1+u1 = "cp866utf8" cp = 1252
a2:=u1+s1 = "utf8cp866" cp = 1252
s2:=s1+a1 = "cp866acp" cp = 866
s2:=a1+s1 = "acpcp866" cp = 866
r1:=s1+a1 = "cp866acp" cp = 1252
r1:=a1+s1 = "acpcp866" cp = 1252
a2:=s1+a1 = "cp866acp" cp = 1252
a2:=a1+s1 = "acpcp866" cp = 1252

It seems the Delphi rules for non rawbytestrings are:
- Concatenate two same declared strings: append bytes, copy dyn. cp
  from left operand. Declared cp of result is left operand.
- Assign same declared: no conversion, only refcount.
- Concatenate two different declared strings: convert both to
  UnicodeString and append. Maybe there is an optimization for same dyn
  cp.
- Assign different declared strings: convert to LHS.


Mattias



More information about the fpc-pascal mailing list