[fpc-devel] assign constant text to widestring
Michael Schnell
mschnell at lumino.de
Thu Oct 23 10:34:31 CEST 2008
> Then you don't understand it yet, I think.
May be....
> If the compiler knows your source file is UTF-8 (by BOM or directive),
> the compiler generates a widestring constant and no conversion
> function is called when assigning to a widestring.
In my test the source code is not UTF8 but UCS2 and does have a correct
BOM. The compiler does work correctly with that and (it seems to be
designed that way) converts the string constant source UCS2->UTF8,
supposedly temporarily assigning a type like UTF8String to the constant.
As the compiler is not aware of the type UTDF8String and errineously
identifies it with ANSIString, now the resulting code interprets the
constant as ANSIString and the conversion to WideString results in a
false text.
>
> However, if you assing this constant to an utf8string, the compiler
> does a wide->ansi conversion, which is done according to the system
> code page, as the compiler does not know the difference between an
> ansistring and an utf8string. In this case you would need to
> utf8decode your widestring constant to get in in your ansistring in
> UTF-8 encoding.
I am aware of what is happening here, but it in fact is not the way a
decent system should work.
A decent system should be able to do the necessary conversions
automatically:
var
ws1, ws2: widestring;
us1, us2: utf8string;
begin
..
ws1 := 'ö2';
us1 := 'ü3';
ws2 := us1;
us2 := ws1;
memo1.lines.add(inttostr(length(w1)); // should show 2
memo1.lines.add(inttostr(length(w2)); // should show 2
memo1.lines.add(inttostr(length(u1)); // should show 3 (even if I
would like 2, but counting in subcodes is much quicker)
memo1.lines.add(inttostr(length(u2)); // should show 3
memo1.lines.add(ws1); // should show ö2
memo1.lines.add(us1); // should show ö2
memo1.lines.add(ws1); // should show ü3
memo1.lines.add(us1); // should show ü3
end;
This should hold independently of the code the source code is stored in
(ANSI: no BOM and no other directive, UZF8: BOM=?, UCS2: BOM=FFFF, UTF8:
BOM=?, UCS4: BOM=?)
-Michael
More information about the fpc-devel
mailing list