[fpc-pascal] charset conversion
Daniël Mantione
daniel.mantione at freepascal.org
Thu Dec 7 17:08:30 CET 2006
Op Thu, 7 Dec 2006, schreef Dominique Leducq:
> The changes you made to cwstring is merely to export the iconv API from it. As
> I stated, I already know how to use iconv on Linux/Unix.
Yes, I only gave it as example how I worked around it.
> And I don't necessarily have to deal with widestrings (for example to convert latin-1 to
> UTF-8).
I think it will be best to convert to widestring on input and to the
desired encoding on output; FPC's support is being built around
the widestring manager, which is, as its name says, mostly designed for
widestrings.
> On MS Windows, I found WideCharToMultiByte and MultiByteToWideChar, which take
> a numerical CodePage as parameter. But I don't know how to map a charset name
> ('UTF-8', 'Latin-1', 'ISO-8859-15'...) to a CodePage (any hint ?).
I'm not sure wether all ISO encodings have a code page number, some
definately have. IBM keeps a registry of all code pages they have defined:
http://www-03.ibm.com/servers/eserver/iseries/software/globalization/codepages.html
Also of use migth be the IANA registry of character sets, these are the
encoding which are allowed to be used on the internet, which is mostly a
subset of the ISO encodings combined with a subset of the IBM code pages.
http://www.iana.org/assignments/character-sets
> So apparently there doesn't exist a portable API on top of this.
Not yet. Feel free to propose one.
> P.S.: About the cwstring unit, I experimented that nl_langinfo(CODESET) always
> gives ANSI_X3.4-1968 (US-ASCII), unless you call setlocale(LC_CTYPE, ...)
> before (... here being an explicit locale name, or empty string to use the
> locale environment variables setting). So the widestring manager don't work as
> expected...
Okay, noted.
Daniël
More information about the fpc-pascal
mailing list