[fpc-devel] Unicode support - for the 20th time... ;-)

Thu Nov 20 13:24:43 CET 2008

On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote:

> I think basing those functions on code points should suffice.  I also
> think as soon as strings are assigned or loaded from file, they should
> be normalized. So two code points like the A and Umlaut code points
> would become one.

How would one know which code points were originally decomposed and  
which weren't? Should it be impossible to save a file that  
demonstrates the different possible UTF encodings of e.g. ö, and  
should a loaded/saved file which contained both encodings really be  
automatically entirely composed or decomposed when saved again?

I know of no text editor that handles UTF which automatically changes  
the encoding of pre-existing characters when saving the documents. And  
I would never want to use a text editor which does that by default.

> The .SaveToFile() methods could take an optional parameter to decide
> if the normalized version of the string gets saved, or if it must be
> split again - which I think Mac OS-X prefers.

It doesn't. All OS functions that return file/path names return  
decomposed (UTF-8)strings. They accept both composed and decomposed  
strings. Text files are text files and can have any encoding you want,  
with any combination of composed and decomposed characters.

Jonas