[fpc-devel] assign constant text to widestring

Michael Schnell mschnell at lumino.de
Thu Oct 23 10:34:31 CEST 2008


> Then you don't understand it yet, I think. 
May be....
> If the compiler knows your source file is UTF-8 (by BOM or directive), 
> the compiler generates a widestring constant and no conversion 
> function is called when assigning to a widestring.
In my test the source code is not UTF8 but UCS2 and does have a correct 
BOM. The compiler does work correctly with that and (it seems to be 
designed that way) converts the string constant source UCS2->UTF8, 
supposedly temporarily assigning a type like UTF8String to the constant. 
As the compiler is not aware of the type UTDF8String and errineously 
identifies it with ANSIString,  now the resulting code interprets the 
constant as ANSIString and the conversion to WideString results in a 
false text.
>
> However, if you assing this constant to an utf8string, the compiler 
> does a wide->ansi conversion, which is done according to the system 
> code page, as the compiler does not know the difference between an 
> ansistring and an utf8string. In this case you would need to 
> utf8decode your widestring constant to get in in your ansistring in 
> UTF-8 encoding.
I am aware of what is happening here, but it in fact is not the way a 
decent system should work.

A decent system should be able to do the necessary conversions 
automatically:

var
  ws1, ws2: widestring;
  us1, us2: utf8string;
begin
 ..
 ws1 := 'ö2';
 us1 := 'ü3';
 ws2 := us1;
 us2 := ws1;
 memo1.lines.add(inttostr(length(w1));  // should show 2
 memo1.lines.add(inttostr(length(w2));  // should show 2
 memo1.lines.add(inttostr(length(u1));  // should show 3 (even if  I 
would like 2, but counting in subcodes is much quicker)
 memo1.lines.add(inttostr(length(u2));  // should show 3
 memo1.lines.add(ws1);  // should show ö2
 memo1.lines.add(us1);  // should show ö2
 memo1.lines.add(ws1);  // should show ü3
 memo1.lines.add(us1);  // should show ü3

end;

This should hold independently of the code the source code is stored in 
(ANSI: no BOM and no other directive, UZF8: BOM=?, UCS2: BOM=FFFF, UTF8: 
BOM=?, UCS4: BOM=?)

-Michael



More information about the fpc-devel mailing list