[fpc-devel] Unicodestring branch, please test and help fixing

Fri Sep 12 02:01:17 CEST 2008

listmember wrote:
>
>> I also do not know of other apps that could do this. (And it may not be
>> possible). Look around. Databses for example, AFAIK the most you can do
>> is define a collation per column.
> True. But, that does not mean that those app/databases are well 
> thought out. Does it?
Point of View. Those DB get sold, so either people take what they can 
get and silently accept it (I haven't seen discussions like this on 
related DB discussion groups [ or maybe I read the wrong groups :) ])
or the majority of people doesn't need it.

BTW people want there DB to sort text in a way, that help finding 
entries in the result. So the ordering process should not rely on 
knowledge if a word is English or French. If It did rely on the 
language, then the ordering would not help the search, because you have 
to know the language of all other words to find the one word you are 
looking for.

So maybe the design is quite well thought?
>
>> And how would you sort the following example, with mixed collation. Take
>> the various german collations. ae can be used as a substitution for
>> a-umlaut.
> This is actulaly an arbitary decision --there is no agreed standard on 
> this, that I am aware-- so, each developer can have their own way.
Well yes of course you can define how to. But then everyone has a 
different need, and a different definition. That would mean FPC had to 
implement dozens of algorithms.
So it seems better to leave it to each person, as it seems it will be an 
individual thing anyway.

As for Storing info per string or per char. (Info could be anything: 
collation, color, style, font, source-of-quote,  author, creation-date, 
file, ....) everyone would like there own. So again FPC shouldn't do it. 
Or everyone gets all the overhead of what all the others wanted.

Also FPC is a programming language. Not a word processing tool
And FPC is pascal. Pascal (afaik) has reference counted strings. And 
objects are not reference counted. Not to mention objects (as string 
type) would only benefit if everyone was allowed to create their own 
child-classes.
Then instead of asking for strings as object, I would ask for an 
additional ref-counted object type (with auto destruction). The string 
library could be based on this. I am not asking for suxch a think 
because a) it wouldn't be pascal anymore. b) beware of the mem-leaks

If pascal doesn't suit the need of a specific task, choose a different 
tool. Instead of inventing a new pascal.
I don't to shell scripts in pascal. And simple web scripts are php or perl.

>
>> How would you sort data where one source is of one collation, the other
>> source of another (or even worse the collation changes halfway through)?
>> It is impossible by definition.
>
> No. It is not impossible.
> But, yes, there is no definition (standard).
>
> It would be upto the developer or the entity that the developer is 
> working in.
Btw in normal math you can not devide a number by zero... Of course you 
can define your own math
>
>> I even thing that collation is not part of the string. it does not
>> change the meaning of the string. It is only used in specific
>> operations. And then it must be one collation for both strings. So if
>> each of the string had a collation that would cause an issue.
>
> But, my question is --imho-- a lot more relevant to the thread at hand:
>
> How would you do case-insensitive search in a multilangual text.
same as above applies. If every char (or substring) has a collation of 
its own, then you need to define how to compare cross-collation.

because
 find('E'[collation1],  'merci'[collation2] + 'mein herr'[collation3])

needs to compare an E (that wants collation1 for the compare) with each 
of the 'e' (that want other collations)
maybe collation1 says that E should equal in upper and lower, while the 
other collations do not? ore vice versa.

there is no standard.

>
> [this has nothing to do with rendering or GUI.]