[fpc-devel] Unicode support (yet again)

Felipe Monteiro de Carvalho felipemonteiro.carvalho at gmail.com
Wed Sep 14 08:48:39 CEST 2011


On Tue, Sep 13, 2011 at 9:23 PM, Michael Van Canneyt
<michael at freepascal.org> wrote:
> Current strategy on fpc core seems to be to have 2 RTLs:
>
> One with unicode string, one with ansistring.

Isn't that somewhat nasty for people currently using UTF-8?

I mean, lets say that we can divide everyone using FPC into 3 groups:

1st> People using ansi that don't want to change any line of code ->
They get a path forward with this proposal, even if temporary (the
Ansi half of the RTL really seams like the definition of deprecated to
me)
2nd> People using UTF-8 -> They get no love at all and can choose from
using the old RTL with no Unicode and put some tape to fix some holes
or migrate to something incompatible.
3rd> People that want to use UTF-16 -> They get a new RTL to move forward

But how many percent of FPC users, libraries and applications are on each group?

1st> I really can't imagine anyone who would want to stay stuck to the
pre-Unicode world forever...
2nd> The vast majority of users, libraries and applications through Lazarus
3rd> msegui and possibly Delphi 2009+ users

Lazarus is by far the most widely way to use FPC, so I would guess
that the group 2 has more then 75% of all users, and still it gets no
love at all. Which real path forward is provided for these users?

Of course one path is migrating everything, the LCL, the IDE, SynEdit,
all packages, etc, to UTF-16, but that's a huge, immense work with
zero advantages over what we are doing up to now, it's just migrate to
migrate, who will be motivated to do that? My point is that it is not
very reasonable to migrate so much working code for no advantage at
all, so the Unicode RTL could provide something to easy interfacing
with UTF-8, for example:

* overloaded versions of routines and methods for utf8string
* A TStrings and TStringList for utf8

These would need to be ifdefed so they are not present in the Ansi
RTL. Without even a TStrings for utf-8 one cannot really expect
Lazarus to be able to use the Unicode URL without doing a full
migration to UTF-16 ...

My final point is just: why not? If code in the RTL could fix things
for Lazarus why impose the need to migrate so much working code?

If the Unicode RTL provides UTF-8 support too then Lazarus projects
could be migrated by just doing 2 things:

1> Change all places which use TStrings and TStringList to
TStringsUTF8 and TStringListUTF8
2> Change all places which add utf-8 to ansi conversions to the RTL
with no conversion at all

On the other hand if we have no path forward except for migrating to
UTF-16 I can imagine we will still be talking about how to move
forward in 5 years from now...

-- 
Felipe Monteiro de Carvalho



More information about the fpc-devel mailing list