[fpc-devel] Issues with CP_UTF7
Bart
bartjunk64 at gmail.com
Mon May 13 22:31:35 CEST 2019
Hi,
See https://forum.lazarus.freepascal.org/index.php/topic,45380.msg320902.html#msg320902
Discussed with Marco, who referred me to this ML.
The following program has 2 issues:
===============
program cps;
{$mode objfpc}
{$h+}
uses
sysutils;
type
AsciiString = type AnsiString(CP_ASCII);
Utf7String = type AnsiString(CP_UTF7);
var
U7: Utf7String;
U8: Utf8String;
A: AsciiString;
S,S2: String;
SS: ShortString;
function StrToHex(const S: rawbytestring): string;
var
sd: ShortString;
i: Integer;
begin
sd := format('[%5d] ',[StringCodePage(S)]);
for i := 1 to Length(s) do
sd := sd + '$' + IntToHex(Byte(s[i]), 2) + ' ';
sd := trim(sd);
result := sd;
end;
//function StrToHex(const S: UnicodeString): shortstring;
//var
// sd: ShortString;
// i: Integer;
//begin
// sd := '';
// for i := 1 to Length(s) do
// sd := sd + '$' + IntToHex(Word(s[i]), 4) + ' ';
// sd := trim(sd);
// result := sd;
//end;
begin
//U7 := 'U7'; cps.lpr(45,9) Error: Unknown codepage "65000"
repeat
write('S: ');
readln(S);
U7 := S;
U8 := S;
A := S;
writeln('S : ',StrToHex(S),' [',S,']');
writeln('U8: ',StrToHex(U8),' [',U8,']');
writeln('U7: ',StrToHex(U7),' [',U7,']');
writeln('A : ',StrToHex(A),' [',A,']');
SS := U7;
S2 := U7;
U8 := U7;
writeln('Utf7 -> ShortString: ',StrToHex(SS),' [',SS,']');
writeln('Utf7 -> CP_ACP : ',StrToHex(S2),' [',S2,']');
writeln('Utf7 -> CP_UTF8 : ',StrToHex(U8),' [',U8,']');
until S='';
end.
===========
1: uncommenting line 45 (U7 := 'U7') will make the program
uncompilable: cps.lpr(45,9) Error: Unknown codepage "65000"
Why can't I assign a string literal to an UTF7 string, but I can
assign another string variable to it?
2: Assigning the UTF7 string to a ShortString gives the wrong result
(it will give you the encoded UTF7 bytes).
C:\Users\Bart\LazarusProjecten\bugs\Console\cpstring>cps
S: 1 + 1 = 2
S : [ 1252] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
U8: [65001] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
U7: [65000] $31 $20 $2B $2D $20 $31 $20 $2B $41 $44 $30 $2D $20 $32 [1 + 1 = 2]
A : [20127] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
Utf7 -> ShortString: [ 1252] $31 $20 $2B $2D $20 $31 $20 $2B $41 $44
$30 $2D $20 $32 [1 +- 1 +AD0- 2]
Utf7 -> CP_ACP : [ 1252] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
Utf7 -> CP_UTF8 : [65001] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
The docs (https://www.freepascal.org/docs-html/ref/refsu9.html) say:
"Short strings always use the system code page." and "Plain
ansistrings use the system code page."
To me this implies that the result of assigning any ansistring of some
codepage to a plain string must be the same as assigning to a
shortstring.
So, this may be a bug?
(There is in fact a third issue: compile it with -FcUTF8 and the
compiler will crash, which I already reported n the bugtracker)
--
Bart
More information about the fpc-devel
mailing list