[fpc-devel] Unicode support (yet again)
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Sun Sep 18 15:07:07 CEST 2011
Luiz Americo Pereira Camara schrieb:
> On 17/9/2011 11:46, Hans-Peter Diettrich wrote:
>> Luiz Americo Pereira Camara schrieb:
>>
>>> The codepage of a RawByteString at runtime will keep the previous
>>> CodePage (65001 for UTF8, 1200 for UTF16) as opposed to change to the
>>> RawbyteString CodePage (65535) as a though previously
>>
>> Delphi defines RawByteString=AnsiString, so there is no room for
>> UTF-16 in such an string.
>
> No. I was wrong. See Florian email. RawByteString will keep the codepage
> (1200 = UTF16) and the data of the assigned string be UTF8, be UTF8.
>
>>
>>> So the implementation would be:
>>>
>>> function FileGetAttr(const FileName: RawByteString): Longint;
>>> begin
>>> SetCodePage(FileName, 1200, True);
>>
>> Won't work, because of "const",
>
> Yes
>
>> and because UTF-16 is not a Byte (AnsiChar) string :-(
>
> No. See above. Look in net for Delphi and Unicode doc by marco cantu
Can you give me a link? I checked the XE documentation and RTL, and
could not find that RawByteString can hold UTF-16, and my test confirms
that:
var
a: AnsiString;
u: UnicodeString;
procedure test(r: RawByteString; cp: word);
begin
WriteLn('in: ', StringElementSize(r), ' cp: ', StringCodePage(r), '
len=', length(r));
WriteLn('"', r, '"'); //writes garbage for non-OEM chars, of course
SetCodePage(r, cp, true);
WriteLn('out: ', StringElementSize(r), ' cp: ', StringCodePage(r), '
len=', length(r));
a := r; //use the result, so that nothing can be optimized away
WriteLn('"', r, '"');
end;
This reveals the following behaviour:
1) UnicodeString is converted to AnsiString, before passed to test.
2) Setting codepage to 1200 doesn't change anything.
3) Conversion to UTF-8 seems to work (length changed).
4) Conversion from UTF-8 to Ansi results in an empty string.
I'll ask in an Embarcadero group, in detail for [4].
DoDi
More information about the fpc-devel
mailing list