[fpc-devel] Performance of string handling in trunk
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Tue Jun 25 01:05:43 CEST 2013
Sven Barth schrieb:
> On 24.06.2013 16:44, Hans-Peter Diettrich wrote:
>>> I hope, now I understand that the type RawByteString ( = String
>>> ($FFFF) ) means "codesize = 1 Byte, never to be auto-converted to any
>>> differently encoded String type variable.
>>
>> No. Even if I would like such an encoding, too, Delphi doesn't implement
>> it.
>
> But he is right. RawByteString is defined in unit system as
> AnsiString(CP_NONE) where CP_NONE is defined as $FFFF. This means that
> no conversions to or from a variable of this type are done (or any other
> AnsiString type that has code page $FFFF)
Well, after some tests it looks more complicated to me.
A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is
used to convert the string.
In fact it looks like only the string pointers are copied between
AnsiString and RawByteString, with the refcount changed accordingly.
This can lead to strange results (in XE). As soon as an AnsiString has
obtained a different encoding, no further conversions seem to occur.
Once I copy an OEMString (cp 437) into an RawByteString, and from there
into an AnsiString, the AnsiString has obtained OEM encoding. Adding
further strings to it, of different codepages, only results in a
concatenation of the strings, without any conversions, the encoding is
still reported as OEM. This means that the encoding of an AnsiString is
not guaranteed to be the defined one, not even a unique one!
Can somebody test this with a newer Delphi version?
Resetting such an ill-behaved AnsiString seems to require a direct
assignment of another AnsiString variable, whereupon the AnsiString will
return to its *defined* encoding and resume eventually required
conversions to that encoding.
DoDi
More information about the fpc-devel
mailing list