[fpc-devel] Unicode in the RTL (my ideas)
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Aug 21 00:18:15 CEST 2012
Graeme Geldenhuys schrieb:
> {$IFDEF WINDOWS}
> UnicodeString = type AnsiString(CP_UTF16);
AnsiStrings consist of bytes only, for good reasons (mostly
performance). The Delphi developers wanted to implement what you
suggest, but dropped that approach later again.
String classes have the same performance problems, so that e.g. in .NET
it's suggested to use functions instead of string operators. In Delphi
and FPC compiler magic is used instead of classes.
> {$ELSE}
> // probably not strictly correct, but assuming *nix here. But
> you get the idea
> UnicodeString = type AnsiString(CP_UTF8);
> {$ENDIF
>
> String = type UnicodeString;
> Char = type String[4]; // the maximum size of a Unicode codepoint
> is 4 bytes
A character type is somewhat useless, unless all strings are UTF-32
(what's quite unlikely now). Instead substrings should be used, which
can contain any number of bytes or characters.
You also have to explain what String[4] means in an Unicode environment.
The ShortString type does not have an encoding, and thus is deprecated
in a Unicode environment.
Q: Did you ever read about the new string implementation of FPC?
Do you really want to reinvent the wheel, in another incompatible way?
DoDi
More information about the fpc-devel
mailing list