<html><head><meta name="qrichtext" content="1" /></head><body style="font-size:10pt;font-family:Sans Serif">
<p>On Sunday 28 September 2008 00.10:43 Graeme Geldenhuys wrote:</p>
<p>> On Fri, Sep 26, 2008 at 5:02 PM, Mattias Gaertner</p>
<p>></p>
<p>> <nc-gaertnma@netcologne.de> wrote:</p>
<p>> > s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.</p>
<p>> ></p>
<p>> > In short:</p>
<p>> > A single character for all purposes can not be defined. Unicode can not</p>
<p>> > be handled as array of character.</p>
<p>></p>
<p>> This is what I thought, but everybody seems to side step the answer.</p>
<p>> Thanks Mattias for confirming this. Like I told Martin in one of my</p>
<p>> replies. In the last four years I have not needed indexing into a</p>
<p>> character array, and if I have to parse a string, it's normally</p>
<p>> sequential anyway, which is then easy to track each charter in UTF-8,</p>
<p>> even if multi-byte characters are used.</p>
<p>></p>
<p>></p>
<p>Note that UTF8CharAtByte() won't work work in Mattias example neither.</p>
<p>It seems that Apple decided to use two characters from the BMP to denote umlauts.</p>
<p>Example for ä (U+00E4 LATIN SMALL LETTER A WITH DIARESIS):</p>
<p>a (U+0061 LATIN SMALL LETTER A) followed by ¨ (U+0308, COMBINIG DIARESIS). </p>
<p>Mattias please correct me if I am wrong.</p>
<p>So the problem is not that the characters don't fit in the UCS2 range, the problem is that Apple use the decomposed forms of umlauts.</p>
<p>If you work with OS X HFS you must convert to the composed normal form if fpGUI uses the composed form internally before processing the filenames in fpGUI.</p>
<p>This is independent of using utf-8, utf-16, utf-32 or UCS2. You need conversion tables to do so and again, it is easier to handle with widestrings instead of utf-8 strings if you don't need characters which don't fit into BMP.</p>
<p>And even if you want to support the full Unicode code point range it is simpler with utf-16 because there are surrogate *pairs* only.</p>
<p></p>
<p>In MSEgui I would implement the normalization into the MSEgui filename routines, MSEgui uses a normalized cross platform filename scheme anyway.</p>
<p>Win32 'c:\aaaa\bbb.ext' will be normalized to MSEgui form '/c:/aaaa/bbb.ext', Unicode composed normalization can be done in the same step.</p>
<p></p>
<p>An article about Unicode normalization:</p>
<p></p>
<p>http://en.wikipedia.org/wiki/Unicode_normalization</p>
<p></p>
<p>Martin</p>
</body></html>