[fpc-devel] UTF8 RTL

Jonas Maebe jonas.maebe at elis.ugent.be
Wed Nov 19 09:22:21 CET 2014


On 19/11/14 09:12, Marco van de Voort wrote:
> In our previous episode, Jonas Maebe said:
>>> As Jonas said, not using utf8 on Windows.
>>
>> No, that's not what I said. There is no problem with using UTF-8 on Windows.
> 
> As long as you explicitely use utf8string. 

An ansistring with a dynamic code page of UTF-8 will also work fine with
the adapted RTL routines. What will of course not work is an ansistring
containing UTF-8 data while its dynamic codepage is different from
CP_UTF8 (that includes CP_ACP in case DefaultSystemCodePage happens to
be CP_UTF8), such as the current Lazarus convention.

That is however wrong on all platforms, not just on Windows (no one must
ever assume that the system code page on a unix platform is UTF-8
either; it's the same as assuming that the keyboard layout is qwerty or
that the processor is little endian).

>>> A TStringlist with a ansistrings 
>>> in them passed to an RTL routine will be seen as ansi.
>>
>> That is incorrect (although right now there are no RTL routines that
>> accept stringlists and that are also codepage-aware).
> 
> But I meant that even if you use utf8string in many places as soon as you
> stuff it in a container like tstringlist, that is gone.  (forced ansi
> conversion, since tstringlist's interface is defined using plain string(0))

If you use utf8string, yes, but then the problem indeed occurs when the
string is added to the tstringlist, not when the tstringlist is passed
to an RTL routine (which is how I understood your initial comment).


Jonas




More information about the fpc-devel mailing list