[fpc-pascal] Weird string behavior
Jonas Maebe
jonas.maebe at elis.ugent.be
Fri Jul 22 15:03:52 CEST 2016
On 22/07/16 14:14, Santiago A. wrote:
>
> program testconvertstr;
You are missing {$h+} here. When posting programs, always include all
switches and/or all command line options. The program also compiles with
string = shortstring (the default), but has different behaviour in that
case.
> var
> AnsiStrA:string;
> ResultA:string;
> begin
> writeln('Not Initialized');
> writeln(' AnsiStrA: ',stringcodepage(ansistra));
> writeln(' ResultA: ',stringcodepage(ResultA));
The string code page of an empty string is always DefaultSystemCodePage.
> Writeln;writeln('AnsiStrA:='' ''');
> AnsiStrA:=' ';
> writeln(' AnsiStrA: ',stringcodepage(ansistra));
The string code page of constant strings is described at
http://wiki.freepascal.org/FPC_Unicode_support#String_constants . In
this case, it is CP_ACP (= 0) because no source file code page has been set.
> Writeln;writeln('AnsiStrA[1]:=#243; // o acute win-1252');
> AnsiStrA[1]:=#243; // o acute win-1252
> writeln(' AnsiStrA: ',stringcodepage(ansistra));
Changing an individual byte of a string has no influence on its code page.
> Writeln;writeln('ResultA:=AnsiStrA');
> ResultA:=AnsiStrA;
> writeln(' ResultA: ',stringcodepage(ResultA));
Assigning a ansistring to another ansistring with the same declared code
page (both AnsiStrA and ResultA have CP_ACP as declared code page) won't
change the (dynamic) string code page (see
http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page ).
> Writeln;writeln('ResultA := AnsiStrA + '' ''');
> ResultA:=AnsiStrA+' ';
> writeln(' ResultA: ',stringcodepage(ResultA));
See http://wiki.freepascal.org/FPC_Unicode_support#String_concatenation
: the result of a string concatenation will always be converted to the
declared code page of the destination (and CP_ACP represents the current
value of DefaultSystemCodePage, see
http://wiki.freepascal.org/FPC_Unicode_support#Code_page_identifiers ).
> Writeln;Writeln('ResultA:=AnsiToUtf8(AnsiStrA);');
> ResultA:=AnsiToUtf8(AnsiStrA);
> writeln(' ResultA: ',stringcodepage(ResultA));
AnsiToUtf8() returns a RawByteString with dynamic code page CP_UTF8 (so
that the dynamic code page matches the actual string encoding).
Assigning a RawByteString to any other string type never results in a
string code page conversion (see
http://wiki.freepascal.org/FPC_Unicode_support#RawByteString ).
> Writeln;writeln('ResultA:= AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);');
> ResultA:=AnsiToUtf8(AnsiStrA)+AnsiToUtf8(AnsiStrA);
> writeln(' ResultA: ',stringcodepage(ResultA));
See again
http://wiki.freepascal.org/FPC_Unicode_support#String_concatenations
(same as before).
Jonas
More information about the fpc-pascal
mailing list