[fpc-devel] Unicode support in RTL - Roadmap
Michael Schnell
mschnell at lumino.de
Fri Nov 21 14:50:00 CET 2008
>>
>
> If Length() would return its value in chars, what length in *bytes*
> would the following call set:
>
> SetLength(utfstring_1), Length(utfstring_2));
>
I don't really understand your question.
I think would would need to have two different function
UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String), first
giving the string length in code elements (byte) and second giving the
length in code points (unicode characters),
So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would be 1.
I think we should have a third function Length(UTF8String) that can be
selected by the user (e.g. via a {$ option to be mapped to wither of the
two.
The same would be necessary for the SetLength function
e.g.
(1) UTF8ElementSetLength(utfstring_1), UTF8ElementLength(utfstring_2));
or
(2) UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
(2) would work as expected if the purpose i to delete all but the first
n characters in a string.
I don't see a decent use for (1) other than creating a string long
enough to use as a buffer for e.g. TStream.read.
I do see that there in fact is a compatibility problem when porting old
code with the setting of UTF8Count=Point.
here
SetLength(utfstring_1), Length(utfstring_2)); would be translated as
UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
which does not make sense if UTF8PointLength(utfstring_1) is smaller
than UTF8PointLength(utfstring_2).
-Michael
More information about the fpc-devel
mailing list