[fpc-devel] Unicode in the RTL (my ideas)

Martin Schreiber mse00000 at gmail.com
Tue Aug 21 10:17:56 CEST 2012


Am 21.08.2012 09:55, schrieb Graeme Geldenhuys:
> On 21 August 2012 07:10, Ivanko B<ivankob4mse2 at gmail.com>  wrote:
>> How about supporting in the RTL all versions of UCS-2&  UTF-16 (for
>> fast per-char access etc optimizations) and UTF-8 (for unlimited
>> number of alphabets) ?
>
> All "access a char by index into a string" code I have seen, 99.99% of
> the time work in a sequential manner. For that reason there is no
> speed difference between using a UTF-16 or UTF-8 encoded string. Both
> can be coded equally efficient.
>
Graeme, this is simply not true. Searching for known German characters 
in a UnicodeString the program can use the simple approach by character 
(code unit) index. It is even possible for known Chinese symbols of the 
BMP. And a simple "if" for surrogate pairs is more efficent as a 4-stage 
"case" for utf-8.

Martin



More information about the fpc-devel mailing list