[fpc-devel] Unicode resource strings
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Aug 21 13:18:00 CEST 2012
Aleksa Todorovic schrieb:
> On Tue, Aug 21, 2012 at 10:16 AM, Ivanko B <ivankob4mse2 at gmail.com> wrote:
>> Handling 1..4(6) bytes is less efficient than handling surrogate
>> *pairs*.
>> ===============
>> But surrogate pairs break array-like fast char access anyway, isn't it ?
>
> It's also "broken" in UTF8 in the same way - so none of them gets +1
> on this. UCS4 is the only real winner here (one dword for each
> character).
Depending on the language, ligatures etc. still can span multiple
codepoints. IMO everybody should decide whether he wants to do text
processing for full Unicode, or whether simple stringhandling (as used
till now) is sufficient.
I never heard that non-canoncial text has caused problems in character
sets with accents or umlauts - except in (MacOS, Linux) filenames. Since
file searches have to use the platform API, all required special
handling can be encapsulated in the RTL.
Breaking strings into substrings can be done on specific delimiters
(spaces...), which are all ASCII, again no complication with UTF. A
comparison or search for given patterns also is insensitive to the
encoding. Where would one really need indexed access to single characters?
DoDi
More information about the fpc-devel
mailing list