[fpc-devel] Unicode in the RTL (my ideas)
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Aug 21 13:52:08 CEST 2012
Martin Schreiber schrieb:
>> All "access a char by index into a string" code I have seen, 99.99% of
>> the time work in a sequential manner. For that reason there is no
>> speed difference between using a UTF-16 or UTF-8 encoded string. Both
>> can be coded equally efficient.
>>
> Graeme, this is simply not true. Searching for known German characters
> in a UnicodeString the program can use the simple approach by character
> (code unit) index. It is even possible for known Chinese symbols of the
> BMP. And a simple "if" for surrogate pairs is more efficent as a 4-stage
> "case" for utf-8.
The good ole Pos() can do that, why search for more complicated
implementations?
You still try to use old coding patterns which are simply inappropriate
for dealing with Unicode strings. Why make a distinction between
searching for a single character or multiple characters, when it's known
that one character can require multiple bytes or words in UTF-8/16?
DoDi
More information about the fpc-devel
mailing list