[fpc-devel] Unicode resource strings

Tue Aug 21 10:32:05 CEST 2012

On Mon, 20 Aug 2012 20:56:46 +0200
Florian Klämpfl <florian at freepascal.org> wrote:

>[...]
> The current situation is:
> - either somebody starts to implement support for unicodestring being
> utf-8 (or whatever) on linux in a compatible way with the current
> approach, then 2.8.0 will use this
> - nobody works on it, then 2.8.0 comes with unicodestring=utf-16 always.

IMO unicodestring should be the same on all platforms, because
otherwise the character size switches per platform, which is hard to
test and asking for trouble.

The compiler already supports an UTF8String, right?
If yes, then some functions can use UTF8String, some UnicodeString
(=UTF-16) and the compiler magic will convert automatically.

The difficult decision is what functions and types should use UTF-8
and what UTF-16. This may depend on the platform.

One problem is that an UTF-8/16 string can contain invalid characters
making it impossible to convert.
For example under Linux file names are treated as UTF-8 but are only
bytes. They can and they do contain invalid UTF-8 characters.
If your program should support this, you must use a FindFirst
with UTF-8. To be clear: I don't say the default FindFirst under Linux
must be UTF-8, I only say, there must be one version with UTF-8, e.g.
FindFirstU8 and that must directly use the Linux file functions
without conversions.

I guess there is no good solution for TStrings. Whatever string type is
chosen, some programs will suffer.

Mattias