[fpc-devel] Unicode functions

JoshyFun joshyfun at gmail.com
Tue Aug 26 13:26:01 CEST 2008


Hello Graeme,

Tuesday, August 26, 2008, 8:59:36 AM, you wrote:

GG> I'm just curious how this works with Unicode characters. I guess the
GG> Unicode website should cover this in detail (though I haven't searched
GG> for it yet).  How do you convert from lower to upper case? Is there a
GG> set formula per Unicode Block?

After reading, more or less, a big bunch of documentation I finally
ended with a +/- 950 entries table lookup to perform binary search.
Most letter do not have lowercase/uppercase representation, specially
Arabic and ideographic representations, so for the most common
alphabets the lookup solution seens to be the easiest way, and should
be unicode versioning compatible.

GG> The ASCII (standard English alphabet) is easy. Low char minus 32
GG> decimal = Uppercase char

I'm thinking in hadling them directly instead the lookup table, as it
will be the most common case. But currently everything is written in
the same "complex" way.

GG> What do you do for other non-English languages and special characters?
GG> I noticed the "Latin-1 Supplement" block has the same formula as the
GG> standard English alphabet.
ö (u+00F6) ->> Ö (u+00D6)

No, no formulas at all.

GG> "Latin Extended-A" block doesn't. There the upper and lower characters
GG> are next to each other.  e (u+0119) -> E(u+0118)

The real problem is that some characters are different in amount when
in upper case and lower case, in example something like:

"A in lowercase ai" but "ai in uppercase AI", this situations are not
currently handled, fortunatly this are not very common.

GG> I can't even imagine how things like Hebrew, Greek etc works....  I
GG> guess the Unicode UpperCase() function could become quite complex.

Uppercase and lowercase is quite simple compared with "SameText" ;)

-- 
Best regards,
 JoshyFun




More information about the fpc-devel mailing list