[fpc-devel] Unicode support (yet again)

Luiz Americo Pereira Camara luizmed at oi.com.br
Sun Sep 18 02:15:51 CEST 2011


On 17/9/2011 11:46, Hans-Peter Diettrich wrote:
> Luiz Americo Pereira Camara schrieb:
>
>> The codepage of a RawByteString at runtime will keep the previous 
>> CodePage (65001 for UTF8, 1200 for UTF16) as opposed to change to the 
>> RawbyteString CodePage (65535) as a though previously
>
> Delphi defines RawByteString=AnsiString, so there is no room for 
> UTF-16 in such an string.

No. I was wrong. See Florian email. RawByteString will keep the codepage 
(1200 = UTF16) and the data of the assigned string be UTF8, be UTF8.

>
>> So the implementation would be:
>>
>> function FileGetAttr(const FileName: RawByteString): Longint;
>> begin
>> SetCodePage(FileName, 1200, True);
>
> Won't work, because of "const",

Yes

> and because UTF-16 is not a Byte (AnsiChar) string :-(

No. See above. Look in net for Delphi and Unicode doc by marco cantu

>> Result:=Integer(Windows.GetFileAttributesW(PWideChar(FileName)));
>
> Delphi would use
>   
> Result:=Integer(Windows.GetFileAttributesW(PWideChar(string(FileName))));
>
> with a temporary UnicodeString variable and an according try-finally 
> block.

Yes

>> This way the version using UnicodeString parameter would have the 
>> benefit of being less verbose and use the possible optimizations of 
>> the implicit encoding conversion.
>
> At best it *hides* the temporary variables and implicit conversions, 
> but makes stringhandling more expensive.

I'm talking about:

function FileGetAttr(const FileName: UnicodeString): Longint;
begin
   Result:=Integer(Windows.GetFileAttributesW(PWideChar(FileName)));
end;


Inside the procedure there will be no conversion since is already UTF16, 
just a typecast to PWideChar which in fact is a function

The conversion will be done before the function call  only if necessary 
(eg UTF8 -> UTF16). The decision to convert or not is done at compiler time.


With RawByteString

function FileGetAttr(const FileName: RawByteString): Longint;
begin
   
Result:=Integer(Windows.GetFileAttributesW(PWideChar(UnicodeString(FileName))));
end;


Here the decision to convert or not is done at runtime by checking the 
CodePage of FileName. Also there's one more temp variable due to 
UnicodeString typecast.

In summary:
With UnicodeString decision to convert at design time
With RawByteString decision to convert at run time + one more temp variable


> As I understand the FPC developers, they want to reduce the number of 
> implicit string conversions, what can be achieved best with dedicated 
> string types.

I just saying that ;-) UnicodeString better than RawByteString

Luiz



More information about the fpc-devel mailing list