[fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

Martin Schreiber fpmse at bluewin.ch
Sun Sep 28 09:23:14 CEST 2008


On Sunday 28 September 2008 00.10:43 Graeme Geldenhuys wrote:
> On Fri, Sep 26, 2008 at 5:02 PM, Mattias Gaertner
>
> <nc-gaertnma at netcologne.de> wrote:
> > s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.
> >
> > In short:
> > A single character for all purposes can not be defined. Unicode can not
> > be handled as array of character.
>
> This is what I thought, but everybody seems to side step the answer.
> Thanks Mattias for confirming this. Like I told Martin in one of my
> replies. In the last four years I have not needed indexing into a
> character array, and if I have to parse a string, it's normally
> sequential anyway, which is then easy to track each charter in UTF-8,
> even if multi-byte characters are used.
>
>
Note that UTF8CharAtByte() won't work work in Mattias example neither.
It seems that Apple decided to use two characters from the BMP to denote umlauts.
Example for ä (U+00E4 LATIN SMALL LETTER A WITH DIARESIS):
a (U+0061 LATIN SMALL LETTER A) followed by ¨ (U+0308, COMBINIG DIARESIS). 
Mattias please correct me if I am wrong.
So the problem is not that the characters don't fit in the UCS2 range, the problem is that Apple use the decomposed forms of umlauts.
If you work with OS X HFS you must convert to the composed normal form if fpGUI uses the composed form internally before processing the filenames in fpGUI.
This is independent of using utf-8, utf-16, utf-32 or UCS2. You need conversion tables to do so and again, it is easier to handle with widestrings instead of utf-8 strings if you don't need characters which don't fit into BMP.
And even if you want to support the full Unicode code point range it is simpler with utf-16 because there are surrogate *pairs* only.

In MSEgui I would implement the normalization into the MSEgui filename routines, MSEgui uses a normalized cross platform filename scheme anyway.
Win32 'c:\aaaa\bbb.ext' will be normalized to MSEgui form '/c:/aaaa/bbb.ext', Unicode composed normalization can be done in the same step.

An article about Unicode normalization:

http://en.wikipedia.org/wiki/Unicode_normalization

Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20080928/314c787a/attachment.html>


More information about the fpc-devel mailing list