[fpc-devel] Performance of string handling in trunk

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Jun 25 01:05:43 CEST 2013


Sven Barth schrieb:
> On 24.06.2013 16:44, Hans-Peter Diettrich wrote:
>>> I hope, now I understand that the type RawByteString ( = String
>>> ($FFFF) ) means "codesize = 1 Byte, never to be auto-converted to any
>>> differently encoded String type variable.
>>
>> No. Even if I would like such an encoding, too, Delphi doesn't implement
>> it.
> 
> But he is right. RawByteString is defined in unit system as 
> AnsiString(CP_NONE) where CP_NONE is defined as $FFFF. This means that 
> no conversions to or from a variable of this type are done (or any other 
> AnsiString type that has code page $FFFF)

Well, after some tests it looks more complicated to me.

A RawByteString can obtain any encoding, so no conversions are required.
But when assigned back to an UnicodeString, the obtained encoding is 
used to convert the string.

In fact it looks like only the string pointers are copied between 
AnsiString and RawByteString, with the refcount changed accordingly. 
This can lead to strange results (in XE). As soon as an AnsiString has 
obtained a different encoding, no further conversions seem to occur. 
Once I copy an OEMString (cp 437) into an RawByteString, and from there 
into an AnsiString, the AnsiString has obtained OEM encoding. Adding 
further strings to it, of different codepages, only results in a 
concatenation of the strings, without any conversions, the encoding is 
still reported as OEM. This means that the encoding of an AnsiString is 
not guaranteed to be the defined one, not even a unique one!

Can somebody test this with a newer Delphi version?

Resetting such an ill-behaved AnsiString seems to require a direct 
assignment of another AnsiString variable, whereupon the AnsiString will 
return to its *defined* encoding and resume eventually required 
conversions to that encoding.

DoDi




More information about the fpc-devel mailing list