[fpc-devel] Unicode support - for the 20th time... ;-)

Jonas Maebe jonas.maebe at elis.ugent.be
Thu Nov 20 13:24:43 CET 2008


On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote:

> I think basing those functions on code points should suffice.  I also
> think as soon as strings are assigned or loaded from file, they should
> be normalized. So two code points like the A and Umlaut code points
> would become one.


How would one know which code points were originally decomposed and  
which weren't? Should it be impossible to save a file that  
demonstrates the different possible UTF encodings of e.g. รถ, and  
should a loaded/saved file which contained both encodings really be  
automatically entirely composed or decomposed when saved again?

I know of no text editor that handles UTF which automatically changes  
the encoding of pre-existing characters when saving the documents. And  
I would never want to use a text editor which does that by default.

> The .SaveToFile() methods could take an optional parameter to decide
> if the normalized version of the string gets saved, or if it must be
> split again - which I think Mac OS-X prefers.

It doesn't. All OS functions that return file/path names return  
decomposed (UTF-8)strings. They accept both composed and decomposed  
strings. Text files are text files and can have any encoding you want,  
with any combination of composed and decomposed characters.


Jonas


More information about the fpc-devel mailing list