[fpc-pascal] charset conversion

Daniël Mantione daniel.mantione at freepascal.org
Thu Dec 7 17:08:30 CET 2006



Op Thu, 7 Dec 2006, schreef Dominique Leducq:

> The changes you made to cwstring is merely to export the iconv API from it. As
> I stated, I already know how to use iconv on Linux/Unix.

Yes, I only gave it as example how I worked around it.

> And I don't necessarily have to deal with widestrings (for example to convert latin-1 to
> UTF-8).

I think it will be best to convert to widestring on input and to the 
desired encoding on output; FPC's support is being built around 
the widestring manager, which is, as its name says, mostly designed for 
widestrings.

> On MS Windows, I found WideCharToMultiByte and MultiByteToWideChar, which take
> a numerical CodePage as parameter. But I don't know how to map a charset name
> ('UTF-8', 'Latin-1', 'ISO-8859-15'...) to a CodePage (any hint ?).

I'm not sure wether all ISO encodings have a code page number, some 
definately have. IBM keeps a registry of all code pages they have defined:

http://www-03.ibm.com/servers/eserver/iseries/software/globalization/codepages.html

Also of use migth be the IANA registry of character sets, these are the 
encoding which are allowed to be used on the internet, which is mostly  a 
subset of the ISO encodings combined with a subset of the IBM code pages.

http://www.iana.org/assignments/character-sets

> So apparently there doesn't exist a portable API on top of this.

Not yet. Feel free to propose one.

> P.S.: About the cwstring unit, I experimented that nl_langinfo(CODESET) always
> gives ANSI_X3.4-1968 (US-ASCII), unless you call setlocale(LC_CTYPE, ...)
> before (... here being an explicit locale name, or empty string to use the
> locale environment variables setting). So the widestring manager don't work as
> expected...

Okay, noted.

Daniël


More information about the fpc-pascal mailing list