[fpc-pascal] cwstrings and uclibc

Marco van de Voort marcov at stack.nl
Mon Sep 15 15:01:55 CEST 2008


In our previous episode, Graeme Geldenhuys said:
> >
> > http://www.freepascal.org/docs-html/rtl/system/twidestringmanager.html
> >
> > The hardest parts to implement ourselves are ansi to widestring
> > (because there are dozens of possible encodings for ansi) and
> > uppercase/lowercase.
> 
> OK, but out of the list of encodings supported by iconv - how many are
> actually used. 

At least UTF-8,16 and about 20 encodings (all European + Russian ones, and
the most popular middle and east Asian languages).

> So start with the most used encodings and implement as
> required. Encoding conversions doesn't need to be perfect from the
> start.

And everybody has to lug all those tables in their binaries. The point is
that iconv is already installed and has those tables.

I see the Pascal only attempts mainly as a solution for embedded systems
that don't need iconv, and only have to en/decode a limited set. Not as
substitute for iconv.
 
> Am I understanding this correctly. Widestring is more for MBCS that
> Unicode.

No. Widestring originally was UCS2, which is simplified UTF-16, and has been
upgraded to full UTF-16 with some win2000 fixpack.

One can assume that up to date 2000/XP/Vista roughly support UTF-16.

> MBCS rely on encodings?  Unicode strings like UTF-8 and
> UTF-16 do not have such encoding issues? (amount of possible
> encodings)

Afaik MBCS are encoded in ansistring (array of 1 byte chars). It is more
UTF-8 but not unicode related. It's more an extension of the codepage system
to have codepages larger than 256 items for eastern Asian use.

Once you have unicode, you probably don't need MBCS except for import of
legacy text in the Far East. And that doesn't need to be integrated into the
system, but can be handled by custom systems. (depends on the complexity)



More information about the fpc-pascal mailing list