[fpc-devel] Unicode in the RTL (my ideas)

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Aug 21 00:18:15 CEST 2012


Graeme Geldenhuys schrieb:

>    {$IFDEF WINDOWS}
>       UnicodeString = type AnsiString(CP_UTF16);

AnsiStrings consist of bytes only, for good reasons (mostly 
performance). The Delphi developers wanted to implement what you 
suggest, but dropped that approach later again.

String classes have the same performance problems, so that e.g. in .NET 
it's suggested to use functions instead of string operators. In Delphi 
and FPC compiler magic is used instead of classes.

>    {$ELSE}
>       // probably not strictly correct, but assuming *nix here. But
> you get the idea
>       UnicodeString = type AnsiString(CP_UTF8);
>    {$ENDIF
> 
>    String = type UnicodeString;
>    Char = type String[4];   // the maximum size of a Unicode codepoint
> is 4 bytes

A character type is somewhat useless, unless all strings are UTF-32 
(what's quite unlikely now). Instead substrings should be used, which 
can contain any number of bytes or characters.

You also have to explain what String[4] means in an Unicode environment. 
The ShortString type does not have an encoding, and thus is deprecated 
in a Unicode environment.

Q: Did you ever read about the new string implementation of FPC?
Do you really want to reinvent the wheel, in another incompatible way?

DoDi




More information about the fpc-devel mailing list