[fpc-devel] Performance of string handling in trunk

Michael Schnell mschnell at lumino.de
Wed Jun 26 13:59:17 CEST 2013


I think the implementation would be quite easy, straight forward, fast 
and compatible.

  - The compiler knows the static encoding type of each string variable.
  - The dynamic encoding type of a String is preset to the static 
encoding type when the string is allocated
  - only RawByteStrings (EncodingType $FFFF) are allowed to change their 
dynamic encoding type, with other Strings this will lead to 
unpredictable results

When Strings are assigned:
  - If the static encoding type of source and target is identical (be it 
normal or RAW) (already checked by the compiler) -> the same happens as 
with the pre-Unicode compiler (setting the pointer to the StringRecord 
and managing the RefCount)
  - If the target is statically defined as RawByteString (already 
checked by the compiler) -> the same happens
  - If the source is statically defined as RawByteString (already 
checked by the compiler), code is implemented that checks if the dynamic 
encoding of the source is identical to the (known to the compiler) 
static encoding type of the target -> the same happens

otherwise the conversion library is called. Same checks the _dynamic_ 
encoding type of source and target (thus it only needs to be provided 
with the Strings themselves and no additional information generated by 
the compiler) and does the conversion appropriately.

When doing operation on two Strings (such as "+" and compare), one of 
the operators is (virtually) copied to a String with the same encoding 
type as the other.

  - if one operand is a RawByteString use the (static or dynamic) 
encoding of the other.
  - if both are RawByteStrings use the dynamic encoding use the dynamic 
encoding of one of them (supposedly this is no alternate case to before)

If the conversion library sees a dynamic encoding type of $FFFF for 
either source or target it will fail and issue an exception.

IMHO it makes a much more sense to implement things like TStringList on 
base of RawByteString, as when doing it based on the default System 
encoding, there will be a dual conversion when using it with any other 
encoding type.

IMHO big commonly used, arch independent, non super high-performance 
libraries (like LCL) should use RawByteString as their user interface 
and internally as widely as possible, so that conversions are prevented 
whenever possible (e.g. when the user's call provides a string and 
during the work in the library it is decided that it is not actually used.)

-Michael (the weird one)

More information about the fpc-devel mailing list