[fpc-devel] Unicode support (yet again)

Marco van de Voort marcov at stack.nl
Wed Sep 14 12:03:47 CEST 2011


In our previous episode, Felipe Monteiro de Carvalho said:
> <michael at freepascal.org> wrote:
> > One with unicode string, one with ansistring. They will have the same code,
> > but will be compiled twice, each time with a different compiler define to
> > decide which version it must be.
> 
> Is this possible in UNIX? I can see that in Windows you can use the
> trick to use W versions which are identical except for the string type
> and drop Windows 9x support, but is this really possible for the UNIX
> syscalls? They expect UTF-8 not UTF-16 which is what UnicodeString
> uses.

Afaik QT and many other higher level libs always use UTF-16. MSE does too. Might
also be useful for the JVM port.

The main reason is that users can pick the RTL that is closest to the
codebases that they have, and depending on the platform that they focus on.

UTF16 e.g. when leaning toward Delphi 2009 model, or for MSE, and UTF-8 for
the rest.

It also provides transitional ability. Everybody can make changes (and test
with the "other") in his own pace, and nobody gets forced into a situation
that is disadvanteous for them.

In theory you could even make one byte ansi and one byte utf8 versions on
windows (so three RTLs) though I think longterm only ansi and utf16 are viable on windows,
since those are the native encodings.

Note that the encoding type of the RTL is mainly the main stringtype used in
FPC and the default "char/pchar/STRING" in delphi/unicode mode.  (a
modeswitch over objfpc/delphi?)

It does not mean that you can't keep parts of the source code UTF8 or ansi.
It just might be less optimal, and will have to be properly typed. This will
also happen with packages where upgrading to unicode doesn't make sense.

Note that these are just the big lines of a possible solution. There is only
one big disadvantage : The parts of the RTL that are encoding dependent are
double (relatively minor since many conversions will go automatic when
everything is properly typed), and the release angle. 

But it will be beneficial to everybody, and it is clear to everybody how
something should behave, so there will be no endless bickering over details
and workarounds like this thread. It is a structured approach.

BTW: I explained all this to you, including the not dropping legacy, over some
Chinese food a few months ago. Don't you remember?



More information about the fpc-devel mailing list