[fpc-devel] Performance of string handling in trunk
Michael Schnell
mschnell at lumino.de
Wed Jun 26 13:59:17 CEST 2013
BTW.
I think the implementation would be quite easy, straight forward, fast
and compatible.
- The compiler knows the static encoding type of each string variable.
- The dynamic encoding type of a String is preset to the static
encoding type when the string is allocated
- only RawByteStrings (EncodingType $FFFF) are allowed to change their
dynamic encoding type, with other Strings this will lead to
unpredictable results
When Strings are assigned:
- If the static encoding type of source and target is identical (be it
normal or RAW) (already checked by the compiler) -> the same happens as
with the pre-Unicode compiler (setting the pointer to the StringRecord
and managing the RefCount)
otherwise:
- If the target is statically defined as RawByteString (already
checked by the compiler) -> the same happens
- If the source is statically defined as RawByteString (already
checked by the compiler), code is implemented that checks if the dynamic
encoding of the source is identical to the (known to the compiler)
static encoding type of the target -> the same happens
otherwise the conversion library is called. Same checks the _dynamic_
encoding type of source and target (thus it only needs to be provided
with the Strings themselves and no additional information generated by
the compiler) and does the conversion appropriately.
When doing operation on two Strings (such as "+" and compare), one of
the operators is (virtually) copied to a String with the same encoding
type as the other.
Here:
- if one operand is a RawByteString use the (static or dynamic)
encoding of the other.
- if both are RawByteStrings use the dynamic encoding use the dynamic
encoding of one of them (supposedly this is no alternate case to before)
If the conversion library sees a dynamic encoding type of $FFFF for
either source or target it will fail and issue an exception.
IMHO it makes a much more sense to implement things like TStringList on
base of RawByteString, as when doing it based on the default System
encoding, there will be a dual conversion when using it with any other
encoding type.
IMHO big commonly used, arch independent, non super high-performance
libraries (like LCL) should use RawByteString as their user interface
and internally as widely as possible, so that conversions are prevented
whenever possible (e.g. when the user's call provides a string and
during the work in the library it is decided that it is not actually used.)
-Michael (the weird one)
More information about the fpc-devel
mailing list