[fpc-devel] UTF8 RTL

Marco van de Voort marcov at stack.nl
Sun Nov 23 17:42:06 CET 2014

In our previous episode, Mattias Gaertner said:
> > Let's try to understand first why do you insist on the "UTF-8" in the name ?
> > 
> > Maybe "UTF-8 aware" is better, if you really want the UTF-8 in the name.
> Maybe there is a misunderstanding. At least I can't follow you here.
> I started the thread about ParamStr, which only supports the system
> codepage. I would like to improve it so that it supports
> DefaultSystemCodepage. Or at least add an Unicode version of
> ParamStr.

And the 2-byte unicode version exists, in unit uuchar.  (the "objpas" of
$mode delphiunicode).  For now, simply make a utf8 wrapper that returns an

> Some people has called the RTL with UnicodeString the "Unicode RTL",
> and the Ansistring RTL with system codepage "Ansi RTL".

Well, that's what Windows calls them that (-W are unicode, -A are ansi).
Delphi follows that terminology (D2007- being ansi, D2009+ being unicode).

> I thought "UTF8 RTL" is analog, short and unambiguous.
> Obviously I was wrong.

The RTL leans somewhat to the prefered encoding on each target, so 1-byte on
*nix and 2-byte on Windows. That means that there is no real utf8 support on
 Windows other than generic codepage aware string type (that goes for all
1-byte encodings).

Setting defaultsystemcodepage will make all autoconversions to ansistring(0)
return utf8, so also when calling e.g. unit windows functions. I think it
would be very wise to be careful with that, and have an extensive trial

You might want to keep the current -utf8 routines as mere codepage
correcting wrappers.

> (And, yes, I know, that all three names "Unicode|Ansi|UTF8 RTL" are
> not 100% correct from a technical point of view.)

The filesystem routines are now encoding agnostic. But that assumes you use
a type that the compiler knows the associated encoding.

But filesystem routines are only a small part of the system libraries.

More information about the fpc-devel mailing list