[fpc-devel] cp1252 problems
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Mon Dec 8 00:27:29 CET 2014
Using FPC trunk, Lazarus on WinXP, file and $codepage UTF-8,
DefaultSystemCodePage is 1252.
Then AnsiString variables can contain either UTF-8 or cp1252 strings
(inconsistent), but that's an already known problem :-(
Now I found another bug with AnsiString(1252), which IMO should behave
like AnsiString(CP_ACP). Unfortunately this is not true, the same
assignments of literals to both variables leads to different strings:
type
WinAnsiString = type AnsiString(1252);
const
cACP: AnsiString = 'ä'; //encoded UTF-8 = 'ä'
cWin: WinAnsiString = 'ä'; //encoded 1252 = 'ä?'
var
strA: AnsiString;
strW: WinAnsiString;
begin
strA := 'ä'; //encoded UTF-8 = 'ä'
strW := 'ä'; //encoded 1252 = 'ä?'
WriteLn('equal ',strA=strW); //FALSE!
strW := cACP; //1252 'ä' okay
strA := cWin; //1252 'ä?' wrong as above
end;
It looks to me as if the cp1252 strings (both const and var) are
converted from an UTF-16 char (2 bytes into 2 chars), with the first
char being the letter, the second one being the UTF-16 high byte (0) as
'?' (#63).
Longer literals, like 'äöü', are converted properly, but to encoding
UTF-8 for AnsiString and encoding 1252 for WinAnsiString.
Should I submit an bug report?
DoDi
More information about the fpc-devel
mailing list