[fpc-devel] Unicode support in RTL - Roadmap
Michael Schnell
mschnell at lumino.de
Fri Nov 21 16:16:31 CET 2008
>> So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would
>> be 1.
> Or 2, depending on whether it's predcomposed or decomposed.
I seem to remember that we discussed this some time ago and the result
was that the compose (MAC style ?) characters in fact are a single code
point (Unicode character) that consists of two (maybe more ? ) complete
code points that are tied together by some special coding, so IMHO it
can be considered as a single Unicode character in both cases. If this
would result in a huge table of possibly composed characters I thing we
would stick to the concept of providing a decent functionality and
restrict on those that are currently used by the "customers" we normally
address (Mac in Europe and America). A method to provide an extended
composition table should be provided to have those help themselves who
really need it.
>> which does not make sense if UTF8PointLength(utfstring_1) is smaller
>> than UTF8PointLength(utfstring_2).
> It does not make any sense under any circumstances, because there is
> no way for "UTF8PointSetLength" to know how many bytes it has to
> allocate when you pass a value (any value, regardless of where it
> comes from) to it.
If UTF8PointLength(utfstring_1) is greater than
UTF8PointLength(utfstring_2) no new bytes need to be allocated but the
function is just equivalent to
utfstring1 := UTF8PointCopy(utfstring1, 1, UTF8PointLength(utfstring_2));
To me this does not seem to impose any problem.
-Michael
More information about the fpc-devel
mailing list