[fpc-devel] Unicode RTL
plugwash at P10Link.net
Thu Nov 17 01:04:28 CET 2005
> *sigh* Yes, what he says is correct. Now to do something with
> strings. I.e. reverse them, or any other operation that needs to split
> the string into pieces.
reversing a string properly requires a very deep understanding of unicode
and huge lookup tables (reversing the code point order will break down when
combining characters come in). Lukilly it is very rarely required outside
tokenising a UTF-8 string does not require any special precautions because
substring matching doesn't require any.
blind truncation (say to fit a length limit) can cause partial characters on
the end but the nature of utf-8 means that theese are much less of a problem
than with more traditional multibyte encodings and are easy to clean up
> Try to do any operating with has to do with the order of characters (i.e.
> compare strings).
sorting/comparing utf-8 strings with a byte-orientated sort routine will
have the same effect as sorting/comparing by unicode code point. If you
wan't a culturally appropriate support you will need special code and you
would have done even with a single byte encoding in most cases.
> If all you did need to do was nothing, people wouldn't be begging for
> Unicode support, right?
ms made the understandable but highly annoying descision to use a totally
seperate unicode api. This means that apps that don't use unicode are
limited to whatever ansi code page the system is using (which btw can be
multibyte and a lot nastier to nieve apps that utf-8).
More information about the fpc-devel