[fpc-devel] Encoded AnsiString

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Jan 7 15:35:34 CET 2014


Michael Van Canneyt schrieb:

> If you want a TStrings that can hold strings which may differ in their 
> encoding (i.e. strings[0] has a different encoding from strings[1]) then 
> you'll be left in the cold.

Just an idea:
What if FPC adds another encoding, similar to RawByteString ($FFFF), but 
without the Delphi quirks? Or simply fix the RawByteString flaws in the 
*Ansi* compiler and RTL?

1) In a discussion in the Embarcadero groups it turned out that, in an 
assignment of a RawByteString to another AnsiString type, the Delphi 
compiler should (but does not) check and eventually convert the string 
to the static encoding of the target. This is (almost) the only way to 
create strings with a different static and dynamic encoding.

2) The stupid conversion to CP_ACP in an assignment *to* an 
RawByteString should be dropped. This applies in detail to the 
assignment to *function results*.

3) The function result type should be honored, in functions accepting 
RawByteString parameters. The Delphi compiler seems to *assume* that the 
results of such functions is RawByteString, so that (including 
beforementioned flaws) the outcome is a CP_ACP string, even if the 
declared function result is e.g. an UTF8String.

Test case:
   function conc(a,b: RawByteString): UTF8String;
   begin Result := a+b; end;
The same result as for
   function conc(a,b: RawByteString): RawByteString;
   begin Result := a+b; end;
the returned string has CP_ACP encoding :-(


When these flaws are fixed in the FPC compiler, the AnsiString types 
will always have the same static and dynamic encoding, as it should be.

Then TStrings could be based on such RawByteStrings, without excess 
conversions or losses. Sorting (TStringList) eventually should ignore 
the dynamic encoding, i.e. work on a strictly binary (byte-by-byte) base.

DoDi




More information about the fpc-devel mailing list