[fpc-devel] Unicode support (yet again)
Luiz Americo Pereira Camara
luizmed at oi.com.br
Sun Sep 18 02:15:51 CEST 2011
On 17/9/2011 11:46, Hans-Peter Diettrich wrote:
> Luiz Americo Pereira Camara schrieb:
>
>> The codepage of a RawByteString at runtime will keep the previous
>> CodePage (65001 for UTF8, 1200 for UTF16) as opposed to change to the
>> RawbyteString CodePage (65535) as a though previously
>
> Delphi defines RawByteString=AnsiString, so there is no room for
> UTF-16 in such an string.
No. I was wrong. See Florian email. RawByteString will keep the codepage
(1200 = UTF16) and the data of the assigned string be UTF8, be UTF8.
>
>> So the implementation would be:
>>
>> function FileGetAttr(const FileName: RawByteString): Longint;
>> begin
>> SetCodePage(FileName, 1200, True);
>
> Won't work, because of "const",
Yes
> and because UTF-16 is not a Byte (AnsiChar) string :-(
No. See above. Look in net for Delphi and Unicode doc by marco cantu
>> Result:=Integer(Windows.GetFileAttributesW(PWideChar(FileName)));
>
> Delphi would use
>
> Result:=Integer(Windows.GetFileAttributesW(PWideChar(string(FileName))));
>
> with a temporary UnicodeString variable and an according try-finally
> block.
Yes
>> This way the version using UnicodeString parameter would have the
>> benefit of being less verbose and use the possible optimizations of
>> the implicit encoding conversion.
>
> At best it *hides* the temporary variables and implicit conversions,
> but makes stringhandling more expensive.
I'm talking about:
function FileGetAttr(const FileName: UnicodeString): Longint;
begin
Result:=Integer(Windows.GetFileAttributesW(PWideChar(FileName)));
end;
Inside the procedure there will be no conversion since is already UTF16,
just a typecast to PWideChar which in fact is a function
The conversion will be done before the function call only if necessary
(eg UTF8 -> UTF16). The decision to convert or not is done at compiler time.
With RawByteString
function FileGetAttr(const FileName: RawByteString): Longint;
begin
Result:=Integer(Windows.GetFileAttributesW(PWideChar(UnicodeString(FileName))));
end;
Here the decision to convert or not is done at runtime by checking the
CodePage of FileName. Also there's one more temp variable due to
UnicodeString typecast.
In summary:
With UnicodeString decision to convert at design time
With RawByteString decision to convert at run time + one more temp variable
> As I understand the FPC developers, they want to reduce the number of
> implicit string conversions, what can be achieved best with dedicated
> string types.
I just saying that ;-) UnicodeString better than RawByteString
Luiz
More information about the fpc-devel
mailing list