[fpc-devel] Unicode RTL
Tomas Hajny
XHajT03 at mbox.vol.cz
Wed Nov 16 17:08:54 CET 2005
Marco van de Voort napsal(a):
>> >> >
>> >> > ... has a different implementation for utf-8 and 8-bit code pages.
>> >>
>> >> Why? With utf-8 a string is searched, with 8-bit cp one char. No
>> other
>> >> char/sequence of char other than ? can generate the byte sequence
>> >> representing ?
>> >
>> > const s : 'Dani?l';
>> >
>> > var accent : utf8char;
>> >
>> > x:=pos('i','Dani?l');
>> > accent:=s[x+1];
>>
>> We could have special support for assignment to type utf8char, couldn't
>> we?
>
> It would be horribly slow, since this would apply to length too, and think
> of
> while i<length(x) do inc(i); like constructs.
>
> I think the avg delphi code simply assumes 100% that chars are fixed
> width.
I'm afraid that you don't get too far with that assumption. "Existing
Delphi code" most probably isn't DBCS/MBCS safe.
Regarding constructs like "while i<length(x) do" - I'd say that most
common use of these are comparison, copying, translation to
uppercase/lowercase and combinations of these. All these operations should
be performed using dedicated (RTL) functions, otherwise they will fail in
DBCS/MBCS environment anyway (or at least result in suboptimal
implementation).
Tomas
More information about the fpc-devel
mailing list