[fpc-devel] Unicode resource strings

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Aug 21 13:18:00 CEST 2012


Aleksa Todorovic schrieb:
> On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B <ivankob4mse2 at gmail.com> wrote:
>> Handling 1..4(6) bytes is less efficient than handling surrogate
>>  *pairs*.
>> ===============
>> But surrogate pairs break array-like fast char access anyway,  isn't it ?
> 
> It's also "broken" in UTF8 in the same way - so none of them gets +1
> on this. UCS4 is the only real winner here (one dword for each
> character).

Depending on the language, ligatures etc. still can span multiple 
codepoints. IMO everybody should decide whether he wants to do text 
processing for full Unicode, or whether simple stringhandling (as used 
till now) is sufficient.

I never heard that non-canoncial text has caused problems in character 
sets with accents or umlauts - except in (MacOS, Linux) filenames. Since 
file searches have to use the platform API, all required special 
handling can be encapsulated in the RTL.

Breaking strings into substrings can be done on specific delimiters 
(spaces...), which are all ASCII, again no complication with UTF.  A 
comparison or search for given patterns also is insensitive to the 
encoding. Where would one really need indexed access to single characters?

DoDi




More information about the fpc-devel mailing list