[fpc-devel] Unicodestring branch, please test and help fixing

Wed Sep 10 15:37:21 CEST 2008

Michael Van Canneyt wrote:
> You are mixing 2 things. There is the actual string content, and there is the
> string metadata. The metadata is something that would apply for flyweight
> pattern. There is nothing to be gained by putting the metadata in an object,

This is true --upto a point.

And, that point arises when you wish to be able to work further with a 
TCharacter.

Say, you're doing text processing --display and all. You would 
definitely like to be able to derive a new class from TCharacter and 
call it, say, TWPCharacter which contains all sorts of other properties, 
color, style, font, size etc.

This would make life immensely easier for such jobs whereby a character 
may need to have more attributes than there exists in the base class.

> since there is only the encoding. Storing the encoding in an object is
> ridiculous and a waste of heap space. a 2 byte encoding is less wasteful
> than a 4 or 8 byte object pointer.

I am afraid I do not agree with this at all. Or rather, it comes accross 
a very ANSI-centric view.

You definitely need a 'language' attribute for a character.

'Locale' does not cut it simply because you can have mixed text i.e. 
portions that belong to a different language.

Some weird characters in a my locale (say, Turkish) does not mecessarily 
mean that that piece of string is in another language --it may well be a 
transcription of /my/ name in a different character set (say, Greek).

Yet, we all know that, (upper-, lower, title-) casing has nothing to do 
with the encoding; nor does collation order etc.

In the above example, I used Turkish and Greek {what an unfortunate 
pairing, some might say :) } on purpose:

Both of which follow their own case folding rules, as well as their own 
collation orders which are both dependent upon a language 
attribute/property.

Without a language attribute, how would you handle these sorts of issues?

Using a parallel byte array?

Really?

Wouldn't it be a lot more humane to us developers if the TCharacter had 
properties such as

-- Language
-- CollationOrder
-- UpperCase
-- LowerCase
-- TittlerCase

where, on setting the Language propery, all others get filled with their 
correct values and are read-only.

Cheers,
Adem