[fpc-devel] Unicodestring branch, please test and help fixing
Martin Friebe
martin at hybyte.com
Fri Sep 12 02:01:17 CEST 2008
listmember wrote:
>
>> I also do not know of other apps that could do this. (And it may not be
>> possible). Look around. Databses for example, AFAIK the most you can do
>> is define a collation per column.
> True. But, that does not mean that those app/databases are well
> thought out. Does it?
Point of View. Those DB get sold, so either people take what they can
get and silently accept it (I haven't seen discussions like this on
related DB discussion groups [ or maybe I read the wrong groups :) ])
or the majority of people doesn't need it.
BTW people want there DB to sort text in a way, that help finding
entries in the result. So the ordering process should not rely on
knowledge if a word is English or French. If It did rely on the
language, then the ordering would not help the search, because you have
to know the language of all other words to find the one word you are
looking for.
So maybe the design is quite well thought?
>
>> And how would you sort the following example, with mixed collation. Take
>> the various german collations. ae can be used as a substitution for
>> a-umlaut.
> This is actulaly an arbitary decision --there is no agreed standard on
> this, that I am aware-- so, each developer can have their own way.
Well yes of course you can define how to. But then everyone has a
different need, and a different definition. That would mean FPC had to
implement dozens of algorithms.
So it seems better to leave it to each person, as it seems it will be an
individual thing anyway.
As for Storing info per string or per char. (Info could be anything:
collation, color, style, font, source-of-quote, author, creation-date,
file, ....) everyone would like there own. So again FPC shouldn't do it.
Or everyone gets all the overhead of what all the others wanted.
Also FPC is a programming language. Not a word processing tool
And FPC is pascal. Pascal (afaik) has reference counted strings. And
objects are not reference counted. Not to mention objects (as string
type) would only benefit if everyone was allowed to create their own
child-classes.
Then instead of asking for strings as object, I would ask for an
additional ref-counted object type (with auto destruction). The string
library could be based on this. I am not asking for suxch a think
because a) it wouldn't be pascal anymore. b) beware of the mem-leaks
If pascal doesn't suit the need of a specific task, choose a different
tool. Instead of inventing a new pascal.
I don't to shell scripts in pascal. And simple web scripts are php or perl.
>
>> How would you sort data where one source is of one collation, the other
>> source of another (or even worse the collation changes halfway through)?
>> It is impossible by definition.
>
> No. It is not impossible.
> But, yes, there is no definition (standard).
>
> It would be upto the developer or the entity that the developer is
> working in.
Btw in normal math you can not devide a number by zero... Of course you
can define your own math
>
>> I even thing that collation is not part of the string. it does not
>> change the meaning of the string. It is only used in specific
>> operations. And then it must be one collation for both strings. So if
>> each of the string had a collation that would cause an issue.
>
> But, my question is --imho-- a lot more relevant to the thread at hand:
>
> How would you do case-insensitive search in a multilangual text.
same as above applies. If every char (or substring) has a collation of
its own, then you need to define how to compare cross-collation.
because
find('E'[collation1], 'merci'[collation2] + 'mein herr'[collation3])
needs to compare an E (that wants collation1 for the compare) with each
of the 'e' (that want other collations)
maybe collation1 says that E should equal in upper and lower, while the
other collations do not? ore vice versa.
there is no standard.
>
> [this has nothing to do with rendering or GUI.]
More information about the fpc-devel
mailing list