[fpc-devel] ansistrings and widestrings

Fri Jan 7 02:38:21 CET 2005

peter green wrote:
> 
> ok i see a MAJOR problem with the semantics of those functions.
> 
> they assume that one widechar is equivilent to one ansichar (that is the
> source count of widechars will equal the destination count of ansichars or
> the source count of widechars will equal the destination count of
> ansichars).
> 
> this is simply not the case for many encodings. (utf-8 sjis euc to name just
> a few)

I came across such problems in another project (CrossPoint). IMO the
best solution is a separation into true fixed-char strings (1, 2, 4?
byte/char), and a true string class for more general encodings. The
string class(es) then also can include proper support for code pages,
MBCS, 7-bit codes, MIME etc.

The only universal international representation for strings is Unicode
(currently 32 bit), that doesn't require any conversions. UTF and other
encodings can save memory, but only at the cost of runtime overhead,
that's why I'd wrap these into classes.

Delphi uses AnsiString for both single and multi byte character strings,
and I'm not sure whether WideChar (as used by Windows) is Unicode-16 or
UTF-16. In international applications (mail!) the handling of such
strings can become a mess, when the assumptions about the encoding of
some string (code page...) don't hold. When consequently records are
used to hold strings together with an indication of the actual encoding,
then a dedicated standard string class would be a better solution.

DoDi