[fpc-devel] Unicodestring branch, please test and help fixing

listmember listmember at letterboxes.org
Wed Sep 10 15:55:25 CEST 2008



> Yes, but most proposals here about a TCharacter are a bit overkill. In
> example languare reference for a given char is not very important from
> a Unicode point of view, unicode focuses its power in the text, so
> locale is important in context operations and collations.

See my other post above.

Locale should really have nothing to do with the text/string business.

Instead, it should only refer to oddities such as decimal number 
representations, thousands separators, date and time strings etc.

Packing the language into the 'locale' info is an abuse IMO, unless it 
refers to such things as what kind of help file it should display to the 
user or the actual strings on menu items (resources) etc.

>  From my point of view the compiler basic types must keep being
> "basic", so be fast, no more than needed memory eaters and so on.

Please don't get resented, but this kind of attitued is verging on being 
offensive..

Instead of looking at the issue from POV of "I don't need it" or "It 
requires more hardware resources", can't you try to evaluate the need on 
its own merit.

And, if you still think that you will never need it, please remember 
that you dont have to --but others may.

> Bring Unicode "power" to the basic string type is overkill, any
> Unicode operation will be in the better case double time consumer, and
> some of them 40-50 times slower. A simple collation will take at least
> 4 times the memory needed by the string itself and for most sort
> algorithms needs the collation is unnecesary.

So?

What if it is a fact of life?

Such as 24-bit graphics. We all know it takes a lot more resources and 
that only patsies need that much color; we ended up using it.

Cn't you consider this unicode caharacter in the same light? (no pun).

> So think in a "new" user
> filling a TStringList with 1000 strings and invoking the Sort method,
> as the strings are Unicode they must be ordered using the locale
> collation or the general collation and finally saying "20 seconds to
> sort 1000 strings!!!!, this looks even worst than javascript!!!!".

No. This is where you are mistaken, I' afraid.

A TUnicodeStringList can contain strings from different collations and 
one 'locale' information will be useless in sorting out that mess. You 
need 'language' information in each of those strings to be able to 
properly sort that unicode list.

> Maybe, again from my point of view, it is more logical to create
> "TTextUnicodeChar" and "TTextUnicodeString" classes which handle
> Unicode textual data, not Unicode data.

I can't see how you can do that. I can't see how we can cater for 
unicode data (not textual data, as you put it) in aything other than a 
specific class [or data type]

> PS: As one of the problems of Unicode support is the big amount of
> data that must be stored (in exe or external file) is there any
> recommended way to code, that unused arrays are left out when the
> function that uses that array is never been called in the main program
> ?

Storage is a completely different problem. You could use, say, UTF-8 
encoding and store also the language information when necessary.



More information about the fpc-devel mailing list