[fpc-devel] Unicode RTL

Wed Nov 16 14:42:59 CET 2005

Daniël Mantione wrote:
> Op Wed, 16 Nov 2005, schreef Tomas Hajny:
>
>> Big overhead (double maintenance efforts for all targets supporting this
>> schisma). :-( I'd say it's better to successively identify the weak
>> points
>> and address these case by case.
>
> I know, I'm all for abolishing Chinese (and perhaps Korean), the only
> language(s) that absolutely cannot be written in with an 8 byte code.....

Well, I didn't suggest abolishing of any languages (or maybe... - C#? ;-) ).

> However, it won't work that way in this world and also people in other
> scripts tend to like Unicode.
>
> Now, take for example the FCL. Tstringlist uses ansistrings, for example.

This is one particular weak point which needs to be addressed, IMHO.

> Now there are several solutions:
> 1. Make a Twidestringlist
> 2. Make Tstringlist use widestrings internally and add methods for both
>    ansistrings and widestrings.
> 3. Make an FCL with ansistrings and an FCL with widestrings.
>
> I have been convinced that 3 causes the least maintenance trouble and the
> least overhead for people that don't need it. The reason is strings are
> used everywhere in about any library. It cannot be reduced to a few weak
> points :/

You're right that strings are used everywhere, but I don't think that this
really means that you need to add special support for widestrings
everywhere. In many places you can pass a DBCS/MBCS string to it today
(e.g. encoded using UTF-8) and it wouldn't cause any harm. From my point
of view, you need some kind of special support mainly for sort operations
(which includes your TList) and then for visual classes (length of text
for controls, etc.). In addition, you certainly need to have a proper
routines for I/O. However, e.g. your particular example in the forum
discussion is IMHO conceptually wrong. Turning a string around just cannot
be performed this way (this is unsupported by design for DBCS/MBCS texts;
not even mentioning the fact that the example is "somewhat" artificial).
People who want to perform such an operation need to analyse and design
the implementation properly, probably by translating the ansistring to a
widestring first in this case. How this translation is performed is
another question and it depends on programmer's decision. It could be that
the string already _is_ an UCS2 string (and "translation" to widestring
means that you just copy it byte by byte), it could be UTF-8 and it could
be even a simple string created in particular codepage (SBCS). This is
programmer's decision (trade-off between the widest support and the best
performance); the same way that he has to decide whether he'd use
multi-platform APIs or native API of a particular platform, or whether
he'd use/import XxxxW or XxxxA API function for his Win32 application.

Maybe I'm still overlooking the real issues. Please, give me more concrete
examples which cannot be resolved at the moment, we could discuss them
(and then possibly come to a conclusion that separate RTL would be
better/necessary).

Tomas