[fpc-devel] Unicodestring branch, please test and help fixing
listmember
listmember at letterboxes.org
Wed Sep 10 15:37:21 CEST 2008
Michael Van Canneyt wrote:
> You are mixing 2 things. There is the actual string content, and there is the
> string metadata. The metadata is something that would apply for flyweight
> pattern. There is nothing to be gained by putting the metadata in an object,
This is true --upto a point.
And, that point arises when you wish to be able to work further with a
TCharacter.
Say, you're doing text processing --display and all. You would
definitely like to be able to derive a new class from TCharacter and
call it, say, TWPCharacter which contains all sorts of other properties,
color, style, font, size etc.
This would make life immensely easier for such jobs whereby a character
may need to have more attributes than there exists in the base class.
> since there is only the encoding. Storing the encoding in an object is
> ridiculous and a waste of heap space. a 2 byte encoding is less wasteful
> than a 4 or 8 byte object pointer.
I am afraid I do not agree with this at all. Or rather, it comes accross
a very ANSI-centric view.
You definitely need a 'language' attribute for a character.
'Locale' does not cut it simply because you can have mixed text i.e.
portions that belong to a different language.
Some weird characters in a my locale (say, Turkish) does not mecessarily
mean that that piece of string is in another language --it may well be a
transcription of /my/ name in a different character set (say, Greek).
Yet, we all know that, (upper-, lower, title-) casing has nothing to do
with the encoding; nor does collation order etc.
In the above example, I used Turkish and Greek {what an unfortunate
pairing, some might say :) } on purpose:
Both of which follow their own case folding rules, as well as their own
collation orders which are both dependent upon a language
attribute/property.
Without a language attribute, how would you handle these sorts of issues?
Using a parallel byte array?
Really?
Wouldn't it be a lot more humane to us developers if the TCharacter had
properties such as
-- Language
-- CollationOrder
-- UpperCase
-- LowerCase
-- TittlerCase
where, on setting the Language propery, all others get filled with their
correct values and are read-only.
Cheers,
Adem
More information about the fpc-devel
mailing list