[fpc-devel] Unicode support in RTL - Roadmap
Jonas Maebe
jonas.maebe at elis.ugent.be
Fri Nov 21 15:27:55 CET 2008
On 21 Nov 2008, at 14:50, Michael Schnell wrote:
>> If Length() would return its value in chars, what length in *bytes*
>> would the following call set:
>>
>> SetLength(utfstring_1), Length(utfstring_2));
>>
> I don't really understand your question.
>
> I think would would need to have two different function
>
> UTF8ElementlLength(UTF8String) and UTF8PointLength(UTF8String),
> first giving the string length in code elements (byte) and second
> giving the length in code points (unicode characters),
>
> So UTF8ElementlLength('Ü') would be 2 and UTF8PointLength('Ü') would
> be 1.
Or 2, depending on whether it's predcomposed or decomposed.
> I think we should have a third function Length(UTF8String) that can
> be selected by the user (e.g. via a {$ option to be mapped to wither
> of the two.
He's simply talking about the case where Length is mapped to your
proposed UTF8PointLength.
> I do see that there in fact is a compatibility problem when porting
> old code with the setting of UTF8Count=Point.
>
> here
>
> SetLength(utfstring_1), Length(utfstring_2)); would be translated as
> UTF8PointSetLength(utfstring_1), UTF8PointLength(utfstring_2));
>
> which does not make sense if UTF8PointLength(utfstring_1) is smaller
> than UTF8PointLength(utfstring_2).
It does not make any sense under any circumstances, because there is
no way for "UTF8PointSetLength" to know how many bytes it has to
allocate when you pass a value (any value, regardless of where it
comes from) to it.
Jonas
More information about the fpc-devel
mailing list