[fpc-devel] Unicode RTL

Thu Nov 17 01:04:28 CET 2005

> *sigh* Yes, what he says is correct. Now to do something with
> strings. I.e. reverse them, or any other operation that needs to split
> the string into pieces.
reversing a string properly requires a very deep understanding of unicode
and huge lookup tables (reversing the code point order will break down when
combining characters come in). Lukilly it is very rarely required outside
toy examples.

tokenising a UTF-8 string does not require any special precautions because
substring matching doesn't require any.

blind truncation (say to fit a length limit) can cause partial characters on
the end but the nature of utf-8 means that theese are much less of a problem
than with more traditional multibyte encodings and are easy to clean up
later.

>
> Try to do any operating with has to do with the order of characters (i.e.
> compare strings).
sorting/comparing utf-8 strings with a byte-orientated sort routine will
have the same effect as sorting/comparing by unicode code point. If you
wan't a culturally appropriate support you will need special code and you
would have done even with a single byte encoding in most cases.

>
> If all you did need to do was nothing, people wouldn't be begging for
> Unicode support, right?

ms made the understandable but highly annoying descision to use a totally
seperate unicode api. This means that apps that don't use unicode are
limited to whatever ansi code page the system is using (which btw can be
multibyte and a lot nastier to nieve apps that utf-8).