[fpc-devel] UTF8 RTL
Marco van de Voort
marcov at stack.nl
Sun Nov 23 17:42:06 CET 2014
In our previous episode, Mattias Gaertner said:
> > Let's try to understand first why do you insist on the "UTF-8" in the name ?
> >
> > Maybe "UTF-8 aware" is better, if you really want the UTF-8 in the name.
>
> Maybe there is a misunderstanding. At least I can't follow you here.
>
> I started the thread about ParamStr, which only supports the system
> codepage. I would like to improve it so that it supports
> DefaultSystemCodepage. Or at least add an Unicode version of
> ParamStr.
And the 2-byte unicode version exists, in unit uuchar. (the "objpas" of
$mode delphiunicode). For now, simply make a utf8 wrapper that returns an
utf8string.
> Some people has called the RTL with UnicodeString the "Unicode RTL",
> and the Ansistring RTL with system codepage "Ansi RTL".
Well, that's what Windows calls them that (-W are unicode, -A are ansi).
Delphi follows that terminology (D2007- being ansi, D2009+ being unicode).
> I thought "UTF8 RTL" is analog, short and unambiguous.
> Obviously I was wrong.
The RTL leans somewhat to the prefered encoding on each target, so 1-byte on
*nix and 2-byte on Windows. That means that there is no real utf8 support on
Windows other than generic codepage aware string type (that goes for all
1-byte encodings).
Setting defaultsystemcodepage will make all autoconversions to ansistring(0)
return utf8, so also when calling e.g. unit windows functions. I think it
would be very wise to be careful with that, and have an extensive trial
period.
You might want to keep the current -utf8 routines as mere codepage
correcting wrappers.
> (And, yes, I know, that all three names "Unicode|Ansi|UTF8 RTL" are
> not 100% correct from a technical point of view.)
The filesystem routines are now encoding agnostic. But that assumes you use
a type that the compiler knows the associated encoding.
But filesystem routines are only a small part of the system libraries.
More information about the fpc-devel
mailing list