[fpc-devel] UTF8 RTL
Marco van de Voort
marcov at stack.nl
Sun Nov 23 17:42:06 CET 2014
In our previous episode, Mattias Gaertner said:
> > Let's try to understand first why do you insist on the "UTF-8" in the name ?
> > Maybe "UTF-8 aware" is better, if you really want the UTF-8 in the name.
> Maybe there is a misunderstanding. At least I can't follow you here.
> I started the thread about ParamStr, which only supports the system
> codepage. I would like to improve it so that it supports
> DefaultSystemCodepage. Or at least add an Unicode version of
And the 2-byte unicode version exists, in unit uuchar. (the "objpas" of
$mode delphiunicode). For now, simply make a utf8 wrapper that returns an
> Some people has called the RTL with UnicodeString the "Unicode RTL",
> and the Ansistring RTL with system codepage "Ansi RTL".
Well, that's what Windows calls them that (-W are unicode, -A are ansi).
Delphi follows that terminology (D2007- being ansi, D2009+ being unicode).
> I thought "UTF8 RTL" is analog, short and unambiguous.
> Obviously I was wrong.
The RTL leans somewhat to the prefered encoding on each target, so 1-byte on
*nix and 2-byte on Windows. That means that there is no real utf8 support on
Windows other than generic codepage aware string type (that goes for all
Setting defaultsystemcodepage will make all autoconversions to ansistring(0)
return utf8, so also when calling e.g. unit windows functions. I think it
would be very wise to be careful with that, and have an extensive trial
You might want to keep the current -utf8 routines as mere codepage
> (And, yes, I know, that all three names "Unicode|Ansi|UTF8 RTL" are
> not 100% correct from a technical point of view.)
The filesystem routines are now encoding agnostic. But that assumes you use
a type that the compiler knows the associated encoding.
But filesystem routines are only a small part of the system libraries.
More information about the fpc-devel