[fpc-devel] Encoded AnsiString

Jonas Maebe jonas.maebe at elis.ugent.be
Tue Jan 7 16:33:50 CET 2014


On 07 Jan 2014, at 15:35, Hans-Peter Diettrich wrote:

> 1) In a discussion in the Embarcadero groups it turned out that, in an assignment of a RawByteString to another AnsiString type, the Delphi compiler should (but does not) check and eventually convert the string to the static encoding of the target. This is (almost) the only way to create strings with a different static and dynamic encoding.
> 
> 2) The stupid conversion to CP_ACP in an assignment *to* an RawByteString should be dropped. This applies in detail to the assignment to *function results*.

The conversion does not happen for all assignments, it only happens for concatenations that are assigned to RawByteString. And even then it doesn't always happen. Please read the wiki page I wrote (trying to prevent exactly this kind of wrong statements from being further repeated, and obviously failing). I even mentioned that we will probably add a way to change the behaviour in this specific case.

> 3) The function result type should be honored, in functions accepting RawByteString parameters. The Delphi compiler seems to *assume* that the results of such functions is RawByteString, so that (including beforementioned flaws) the outcome is a CP_ACP string, even if the declared function result is e.g. an UTF8String.
> 
> Test case:
>  function conc(a,b: RawByteString): UTF8String;
>  begin Result := a+b; end;

This will always return CP_UTF8 on FPC. Does it really return CP_ACP on Delphi? Even if it does, I doubt we will change that. We even couldn't easily do that, because we don't know the static code pages of the strings that are concatenated inside the RTL routine that handles this.

> Then TStrings could be based on such RawByteStrings, without excess conversions or losses.

The problem with changing TStrings from AnsiString to RawByteString is not so much related to the behaviour of RawByteString, but more regarding descendent classes in existing third party (= user) source code that override methods using AnsiString parameters. We don't want to force everyone to rewrite their code so it uses RawByteString (if anything, RawByteString should probably be used as little as possible in user code, because always correctly dealing with all possible code pages is very hard).

> Sorting (TStringList) eventually should ignore the dynamic encoding, i.e. work on a strictly binary (byte-by-byte) base.

Looking for just one second at the definition of the Sort methods of TStringList (and TStrings) would have prevented you from writing the above statement, which does not make any sense whatsoever (unless you want the compiler to start changing all code where a programmer passes a comparison function that does take code pages into account to the Sort methods of TStrings/TStringList).


Jonas


More information about the fpc-devel mailing list