[fpc-devel] Unicode support in RTL - Roadmap

Fri Nov 21 16:16:31 CET 2008

>> So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would 
>> be 1.
> Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and the result 
was that the compose (MAC style ?) characters in fact are a single code 
point (Unicode character) that consists of two (maybe more ? ) complete 
code points that are tied together by some special coding, so IMHO it 
can be considered as a single Unicode character in both cases. If this 
would result in a huge table of possibly composed characters I thing we 
would stick to the concept of providing  a decent functionality and 
restrict on those that are currently used by the "customers" we normally 
address (Mac in Europe and America). A method to provide an extended 
composition table should be provided to have those help themselves who 
really need it.
>> which does not make sense if UTF8PointLength(utfstring_1) is smaller 
>> than UTF8PointLength(utfstring_2).
> It does not make any sense under any circumstances, because there is 
> no way for "UTF8PointSetLength" to know how many bytes it has to 
> allocate when you pass a value (any value, regardless of where it 
> comes from) to it.
If UTF8PointLength(utfstring_1) is greater than 
UTF8PointLength(utfstring_2) no new bytes need to be allocated but the 
function is just equivalent to

utfstring1 := UTF8PointCopy(utfstring1, 1, UTF8PointLength(utfstring_2));

To me this does not seem to impose any problem.

-Michael