[fpc-devel] ansistrings and widestrings

Florian Klaempfl F.Klaempfl at gmx.de
Fri Jan 7 10:41:16 CET 2005


DrDiettrich wrote:

> peter green wrote:
> 
>>ok i see a MAJOR problem with the semantics of those functions.
>>
>>they assume that one widechar is equivilent to one ansichar (that is the
>>source count of widechars will equal the destination count of ansichars or
>>the source count of widechars will equal the destination count of
>>ansichars).
>>
>>this is simply not the case for many encodings. (utf-8 sjis euc to name just
>>a few)
> 
> 
> I came across such problems in another project (CrossPoint). IMO the
> best solution is a separation into true fixed-char strings (1, 2, 4?
> byte/char), and a true string class for more general encodings. The
> string class(es) then also can include proper support for code pages,
> MBCS, 7-bit codes, MIME etc.
> 
> The only universal international representation for strings is Unicode
> (currently 32 bit), that doesn't require any conversions. 

That's not true. E.g. the german umlauts can be represented by 2 chars 
when using UTF-32 (the char and the two dots), same apply to a lot of 
other languages.

> UTF and other

UTF-8 is unicode as well, unicode is a standard which decribes char 
mappings and encodings besides other things.

> encodings can save memory, but only at the cost of runtime overhead,
> that's why I'd wrap these into classes.
> 
> Delphi uses AnsiString for both single and multi byte character strings,
> and I'm not sure whether WideChar (as used by Windows) is Unicode-16 or
> UTF-16. In international applications (mail!) the handling of such
> strings can become a mess, when the assumptions about the encoding of
> some string (code page...) don't hold. When consequently records are
> used to hold strings together with an indication of the actual encoding,
> then a dedicated standard string class would be a better solution.

Encoding isn't the main problem, you need dedicated procecures and 
functions for unicode comparision, upper/lower conversion etc. To achive 
this platfrom independend is very hard ...





More information about the fpc-devel mailing list