[fpc-devel] Unicode support (yet again)

Luiz Americo Pereira Camara luizmed at oi.com.br
Sun Sep 18 16:17:44 CEST 2011


On 18/9/2011 10:07, Hans-Peter Diettrich wrote:
> Luiz Americo Pereira Camara schrieb:
>> On 17/9/2011 11:46, Hans-Peter Diettrich wrote:
>>> Luiz Americo Pereira Camara schrieb:
>>>
>>>> The codepage of a RawByteString at runtime will keep the previous 
>>>> CodePage (65001 for UTF8, 1200 for UTF16) as opposed to change to 
>>>> the RawbyteString CodePage (65535) as a though previously
>>>
>>> Delphi defines RawByteString=AnsiString, so there is no room for 
>>> UTF-16 in such an string.
>>
>> No. I was wrong. See Florian email. RawByteString will keep the 
>> codepage (1200 = UTF16) and the data of the assigned string be UTF8, 
>> be UTF8.
>>
>>>
>>>> So the implementation would be:
>>>>
>>>> function FileGetAttr(const FileName: RawByteString): Longint;
>>>> begin
>>>> SetCodePage(FileName, 1200, True);
>>>
>>> Won't work, because of "const",
>>
>> Yes
>>
>>> and because UTF-16 is not a Byte (AnsiChar) string :-(
>>
>> No. See above. Look in net for Delphi and Unicode doc by marco cantu
>
> Can you give me a link? I checked the XE documentation and RTL, and 
> could not find that RawByteString can hold UTF-16, and my test 
> confirms that:
>

http://edn.embarcadero.com/article/38980

You may read also:

http://www.micro-isv.asia/2008/08/using-rawbytestring-effectively/

> var
>   a: AnsiString;
>   u: UnicodeString;
>
> procedure test(r: RawByteString; cp: word);
> begin
>   WriteLn('in:  ', StringElementSize(r), ' cp: ', StringCodePage(r), ' 
> len=', length(r));
>   WriteLn('"', r, '"'); //writes garbage for non-OEM chars, of course
>   SetCodePage(r, cp, true);
>   WriteLn('out: ', StringElementSize(r), ' cp: ', StringCodePage(r), ' 
> len=', length(r));
>   a := r; //use the result, so that nothing can be optimized away
>   WriteLn('"', r, '"');
> end;
>
> This reveals the following behaviour:
>
> 1) UnicodeString is converted to AnsiString, before passed to test.
> 2) Setting codepage to 1200 doesn't change anything.
> 3) Conversion to UTF-8 seems to work (length changed).
> 4) Conversion from UTF-8 to Ansi results in an empty string.
>
> I'll ask in an Embarcadero group, in detail for [4].

Are you using Delphi XE or fpc?

I dont have Delphi XE. What i know is from that docs and these discussions

Luiz



More information about the fpc-devel mailing list