[fpc-devel] Delphi new AnsiStrings are incredibly broken :-(

Hans-Peter Diettrich DrDiettrich1 at aol.com
Thu Oct 13 15:25:19 CEST 2011


After the test program, sent by Paul, I was playing more with 
AnsiStrings in Delphi XE, with catastrophic results :-(

At least when MBCS enter the scene, and UTF-8 is widely used in FPC and 
Lazarus and is the preferred string type on Linux, incredible bugs show 
up. With
var
   a: AnsiString;
   u, u2: UTF8String;
even a simple assignment of
   u := 'ü';
results in an string of Length 1, the second byte has gone away.

A direct comparison of
   a := 'äöüü';
   u := a;
   WriteLn(a = u);
will show True or False - dunno why.

Also
   Pos('ü', u); //0
   Pos(a[3], u); //0

Since u := a[3] doesn't work,
   u2 := Copy(u, 5, 2); //'ü'
   WriteLn(Pos(u2, u)); //5 - correct!
but
   WriteLn(Pos(string(u2), u)); //3
returns the index in the UnicodeString, into which u was silently converted.


After all these flaws I see no use for Delphi compatible AnsiString 
procedures, in an environment where MBCS (UTF-8) strings must be handled.

If we ever want to proceed with the new AnsiStrings, we should specify 
what every RTL procedure exactly *should* do, in a meaningful and usable 
way, regardless of Delphi compatibility.

E.g. Pos() should convert the first argument (SubStr) to the encoding of 
the second string, if both are different, before searching for the 
SubStr. Only then the result can be used as an index into the second string.


Now we can guess whether the flawed handling of AnsiStrings in Delphi is 
due to sloppyness of the implementors, or necessary conversions are not 
performed for speed reasons. Whenever a chance exists, that Pos or other 
standard functions must convert an argument, the use of strings with 
different encodings becomes very questionable, performance-wise. Perhaps 
it would be better (and easier to implement) when only one string type 
is used in an application, with possible values of native/UTF-8/UTF-16. 
Required conversions then can be restricted to I/O methods (file 
encoding), ShortString conversions (wich codepage???), and external 
subroutines (OS, widgetsets).

DoDi




More information about the fpc-devel mailing list