[fpc-devel] Issues with CP_UTF7

Bart bartjunk64 at gmail.com
Mon May 13 22:31:35 CEST 2019


Hi,

See https://forum.lazarus.freepascal.org/index.php/topic,45380.msg320902.html#msg320902

Discussed with Marco, who referred me to this ML.

The following program has 2 issues:

===============
program cps;

{$mode objfpc}
{$h+}

uses
  sysutils;

type
  AsciiString = type AnsiString(CP_ASCII);
  Utf7String  = type AnsiString(CP_UTF7);

var
  U7: Utf7String;
  U8: Utf8String;
  A: AsciiString;
  S,S2: String;
  SS: ShortString;

function StrToHex(const S: rawbytestring): string;
var
  sd: ShortString;
  i: Integer;
begin
  sd := format('[%5d] ',[StringCodePage(S)]);
  for i := 1 to Length(s) do
    sd := sd + '$' + IntToHex(Byte(s[i]), 2) + ' ';
  sd := trim(sd);
  result := sd;
end;

//function StrToHex(const S: UnicodeString): shortstring;
//var
//  sd: ShortString;
//  i: Integer;
//begin
//  sd := '';
//  for i := 1 to Length(s) do
//    sd := sd + '$' + IntToHex(Word(s[i]), 4) + ' ';
//  sd := trim(sd);
//  result := sd;
//end;

begin
  //U7 := 'U7';     cps.lpr(45,9) Error: Unknown codepage "65000"
  repeat
    write('S: ');
    readln(S);
    U7 := S;
    U8 := S;
    A := S;
    writeln('S : ',StrToHex(S),' [',S,']');
    writeln('U8: ',StrToHex(U8),' [',U8,']');
    writeln('U7: ',StrToHex(U7),' [',U7,']');
    writeln('A : ',StrToHex(A),' [',A,']');
    SS := U7;
    S2 := U7;
    U8 := U7;
    writeln('Utf7 -> ShortString: ',StrToHex(SS),' [',SS,']');
    writeln('Utf7 -> CP_ACP     : ',StrToHex(S2),' [',S2,']');
    writeln('Utf7 -> CP_UTF8    : ',StrToHex(U8),' [',U8,']');
  until S='';
end.
===========

1: uncommenting line 45 (U7 := 'U7') will make the program
uncompilable: cps.lpr(45,9) Error: Unknown codepage "65000"
Why can't I assign a string literal to an UTF7 string, but I can
assign another string variable to it?

2: Assigning the UTF7 string to a ShortString gives the wrong result
(it will give you the encoded UTF7 bytes).

C:\Users\Bart\LazarusProjecten\bugs\Console\cpstring>cps
S: 1 + 1 = 2
S : [ 1252] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
U8: [65001] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
U7: [65000] $31 $20 $2B $2D $20 $31 $20 $2B $41 $44 $30 $2D $20 $32 [1 + 1 = 2]
A : [20127] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
Utf7 -> ShortString: [ 1252] $31 $20 $2B $2D $20 $31 $20 $2B $41 $44
$30 $2D $20 $32 [1 +- 1 +AD0- 2]
Utf7 -> CP_ACP     : [ 1252] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]
Utf7 -> CP_UTF8    : [65001] $31 $20 $2B $20 $31 $20 $3D $20 $32 [1 + 1 = 2]

The docs (https://www.freepascal.org/docs-html/ref/refsu9.html) say:
"Short strings always use the system code page." and "Plain
ansistrings use the system code page."
To me this implies that the result of assigning any ansistring of some
codepage to a plain string must be the same as assigning to a
shortstring.

So, this may be a bug?

(There is in fact a third issue: compile it with -FcUTF8 and the
compiler will crash, which I already reported n the bugtracker)

-- 
Bart



More information about the fpc-devel mailing list