[fpc-devel] Unicode in the RTL (my ideas)

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Aug 21 13:52:08 CEST 2012


Martin Schreiber schrieb:

>> All "access a char by index into a string" code I have seen, 99.99% of
>> the time work in a sequential manner. For that reason there is no
>> speed difference between using a UTF-16 or UTF-8 encoded string. Both
>> can be coded equally efficient.
>>
> Graeme, this is simply not true. Searching for known German characters 
> in a UnicodeString the program can use the simple approach by character 
> (code unit) index. It is even possible for known Chinese symbols of the 
> BMP. And a simple "if" for surrogate pairs is more efficent as a 4-stage 
> "case" for utf-8.

The good ole Pos() can do that, why search for more complicated 
implementations?

You still try to use old coding patterns which are simply inappropriate 
for dealing with Unicode strings. Why make a distinction between 
searching for a single character or multiple characters, when it's known 
that one character can require multiple bytes or words in UTF-8/16?

DoDi




More information about the fpc-devel mailing list