[fpc-devel] Unicode conversion routines

Sun Nov 23 18:51:35 CET 2008

JoshyFun schrieb:
> Hello Jonas,
> 
> Sunday, November 23, 2008, 6:14:11 PM, you wrote:
> 
>>> //LowerCase arrays size: 2376 bytes
>>> const UnicodeLowerCaseArraySource: array [0..593] of WORD=(
>>> //UpperCase arrays size: 2408 bytes
>>> const UnicodeUpperCaseArraySource: array [0..601] of WORD=(
>>> //TitleCase arrays size: 2424 bytes
>>> const UnicodeTitleCaseArraySource: array [0..605] of WORD=(
> 
> JM> How does this work, given that upper/lower case sometimes depends on
> JM> the language? (e.g., in Turkish the upper case version of "i" is "I"
> JM> -- LATIN CAPITAL LETTER I WITH DOT ABOVE)
> 
> Only general case, language tailoring is a completly different beast.
> Basic unicode functions are language agnostic and for sure will
> produce some bad results in some circunstances. It's almost impossible
> to cover all tailoring even using the database about tailorings, not
> for the upper/lower but for other operations like word breaking.
> 
> Libraries that cover a lot of language particularities are around
> 30-40 (or more) megabytes in runtime data and I think this kind of
> dependencies are a no, no, for FPC.
> 

I think we should simply depend on the OS in this case like the cwstring
unit does though linux doesn't make life easy in this case, it requires
always a conversion to ucs-4 to get a string upper/lower cased.