[fpc-devel] Unicode conversion routines

Sun Nov 23 18:38:24 CET 2008

Hello Jonas,

Sunday, November 23, 2008, 6:14:11 PM, you wrote:

>> //LowerCase arrays size: 2376 bytes
>> const UnicodeLowerCaseArraySource: array [0..593] of WORD=(
>> //UpperCase arrays size: 2408 bytes
>> const UnicodeUpperCaseArraySource: array [0..601] of WORD=(
>> //TitleCase arrays size: 2424 bytes
>> const UnicodeTitleCaseArraySource: array [0..605] of WORD=(

JM> How does this work, given that upper/lower case sometimes depends on
JM> the language? (e.g., in Turkish the upper case version of "i" is "I"
JM> -- LATIN CAPITAL LETTER I WITH DOT ABOVE)

Only general case, language tailoring is a completly different beast.
Basic unicode functions are language agnostic and for sure will
produce some bad results in some circunstances. It's almost impossible
to cover all tailoring even using the database about tailorings, not
for the upper/lower but for other operations like word breaking.

Libraries that cover a lot of language particularities are around
30-40 (or more) megabytes in runtime data and I think this kind of
dependencies are a no, no, for FPC.

-- 
Best regards,
 JoshyFun