[fpc-devel] Unicode support - for the 20th time... ;-)
jonas.maebe at elis.ugent.be
Thu Nov 20 13:24:43 CET 2008
On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote:
> I think basing those functions on code points should suffice. I also
> think as soon as strings are assigned or loaded from file, they should
> be normalized. So two code points like the A and Umlaut code points
> would become one.
How would one know which code points were originally decomposed and
which weren't? Should it be impossible to save a file that
demonstrates the different possible UTF encodings of e.g. ö, and
should a loaded/saved file which contained both encodings really be
automatically entirely composed or decomposed when saved again?
I know of no text editor that handles UTF which automatically changes
the encoding of pre-existing characters when saving the documents. And
I would never want to use a text editor which does that by default.
> The .SaveToFile() methods could take an optional parameter to decide
> if the normalized version of the string gets saved, or if it must be
> split again - which I think Mac OS-X prefers.
It doesn't. All OS functions that return file/path names return
decomposed (UTF-8)strings. They accept both composed and decomposed
strings. Text files are text files and can have any encoding you want,
with any combination of composed and decomposed characters.
More information about the fpc-devel