[fpc-pascal] Weird string behavior

Jonas Maebe jonas.maebe at elis.ugent.be
Fri Jul 22 15:03:52 CEST 2016


On 22/07/16 14:14, Santiago A. wrote:
>
> program testconvertstr;

You are missing {$h+} here. When posting programs, always include all 
switches and/or all command line options. The program also compiles with 
string = shortstring (the default), but has different behaviour in that 
case.

> var
>   AnsiStrA:string;
>   ResultA:string;
> begin
>   writeln('Not Initialized');
>   writeln('  AnsiStrA: ',stringcodepage(ansistra));
>   writeln('  ResultA: ',stringcodepage(ResultA));

The string code page of an empty string is always DefaultSystemCodePage.

>   Writeln;writeln('AnsiStrA:='' ''');
>   AnsiStrA:=' ';
>   writeln('  AnsiStrA: ',stringcodepage(ansistra));

The string code page of constant strings is described at 
http://wiki.freepascal.org/FPC_Unicode_support#String_constants . In 
this case, it is CP_ACP (= 0) because no source file code page has been set.

>   Writeln;writeln('AnsiStrA[1]:=#243; // o acute win-1252');
>   AnsiStrA[1]:=#243; // o acute win-1252
>   writeln('  AnsiStrA: ',stringcodepage(ansistra));

Changing an individual byte of a string has no influence on its code page.

>   Writeln;writeln('ResultA:=AnsiStrA');
>   ResultA:=AnsiStrA;
>   writeln('  ResultA: ',stringcodepage(ResultA));

Assigning a ansistring to another ansistring with the same declared code 
page (both AnsiStrA and ResultA have CP_ACP as declared code page) won't 
change the (dynamic) string code page (see 
http://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page ).

>   Writeln;writeln('ResultA := AnsiStrA + '' ''');
>   ResultA:=AnsiStrA+' ';
>   writeln('  ResultA: ',stringcodepage(ResultA));

See http://wiki.freepascal.org/FPC_Unicode_support#String_concatenation 
: the result of a string concatenation will always be converted to the 
declared code page of the destination (and CP_ACP represents the current 
value of DefaultSystemCodePage, see 
http://wiki.freepascal.org/FPC_Unicode_support#Code_page_identifiers ).

>   Writeln;Writeln('ResultA:=AnsiToUtf8(AnsiStrA);');
>   ResultA:=AnsiToUtf8(AnsiStrA);
>   writeln('  ResultA: ',stringcodepage(ResultA));

AnsiToUtf8() returns a RawByteString with dynamic code page CP_UTF8 (so 
that the dynamic code page matches the actual string encoding). 
Assigning a RawByteString to any other string type never results in a 
string code page conversion (see 
http://wiki.freepascal.org/FPC_Unicode_support#RawByteString ).

>   Writeln;writeln('ResultA:= AnsiToUtf8(AnsiStrA) + AnsiToUtf8(AnsiStrA);');
>   ResultA:=AnsiToUtf8(AnsiStrA)+AnsiToUtf8(AnsiStrA);
>   writeln('  ResultA: ',stringcodepage(ResultA));

See again 
http://wiki.freepascal.org/FPC_Unicode_support#String_concatenations 
(same as before).


Jonas



More information about the fpc-pascal mailing list