[fpc-devel] cp1252 problems

Hans-Peter Diettrich DrDiettrich1 at aol.com
Mon Dec 8 00:27:29 CET 2014


Using FPC trunk, Lazarus on WinXP, file and $codepage UTF-8, 
DefaultSystemCodePage is 1252.

Then AnsiString variables can contain either UTF-8 or cp1252 strings 
(inconsistent), but that's an already known problem :-(

Now I found another bug with AnsiString(1252), which IMO should behave 
like AnsiString(CP_ACP). Unfortunately this is not true, the same 
assignments of literals to both variables leads to different strings:

type
   WinAnsiString = type AnsiString(1252);
const
   cACP: AnsiString = 'ä'; //encoded UTF-8 = 'ä'
   cWin: WinAnsiString = 'ä'; //encoded 1252 = 'ä?'
var
   strA: AnsiString;
   strW: WinAnsiString;
begin
   strA := 'ä'; //encoded UTF-8 = 'ä'
   strW := 'ä'; //encoded 1252 = 'ä?'
   WriteLn('equal ',strA=strW); //FALSE!
   strW := cACP; //1252 'ä' okay
   strA := cWin; //1252 'ä?' wrong as above
end;

It looks to me as if the cp1252 strings (both const and var) are 
converted from an UTF-16 char (2 bytes into 2 chars), with the first 
char being the letter, the second one being the UTF-16 high byte (0) as 
'?' (#63).

Longer literals, like 'äöü', are converted properly, but to encoding 
UTF-8 for AnsiString and encoding 1252 for WinAnsiString.

Should I submit an bug report?

DoDi




More information about the fpc-devel mailing list