[fpc-devel] Performance of string handling in trunk

Michael Schnell mschnell at lumino.de
Thu Jun 27 12:55:06 CEST 2013


On 06/26/2013 06:29 PM, Hans-Peter Diettrich wrote:
>
> Then you have two choices:
> 1) convert the string as required
> 2) copy the content unconverted, but update the encoding

What do you mean by "you have two choices" ?
In fact the compiler designer has the choice to implement some behavior:

  1) convert the string as required
     (seems most sensible)
  2) copy the content unconverted, but update the encoding
    (does not seem sensible at all as with that the static encoding type 
of the normal target String does not match the dynamic encoding type any 
more). At other locations in the code the compiler creates will 
implicitly use the static encoding type (e.g. to decide whether or not a 
conversion is necessary) and the content will be interpreted wrong.
  3) issue a warning or (better) an error at compile time for any 
assignment of a RawByteString to a normal String
   (as conversion is not implemented and not converting leads to 
unpredictable behavior)
  4) issue an exception at runtime when the types don't match
   (not nice but consistent)

Of course appropriate "Delphi Quirks" modes could influence the compiler 
on that behalf.

>
> IMO a reasonable decision should take into account the use of the 
> RawByteString type in RTL code, e.g. for concatenation.
The RTL of course needs to perfectly match the compiler. But as both are 
"under construction" right now (regarding the behavior with this kind of 
Strings <however they are called>) I think that is easily doable.
>
> Can you show us your intended code for these functions?

What functions ? We are talking compiler behavior.

I think I already did write down what I meant (the version with just 
RawByteString and not with an additional String Type of another name 
that might be even more "attractive".
I can do this again in a matrix instead of a the text version I wrote;

When assigning "such" Strings (I hope the monospace is visible in the 
List):

(The compiler does the test for encoding using the static (compile time) 
encoding type with normal strings and the dynamic (in the string record) 
encoding type value for RawByteString.)


                                       Source: |    normal String   |  
RawByteString
target:                                       | |
normal String with the same static encoding   | set pointer     |  set 
pointer     (after checking dynamic encoding)
normal String with different static encoding  |    call conversion |  
call conversion(after checking dynamic encoding)
RawByteString(dynamic type ignored)          |    set pointer     |  set 
pointer     (checking dynamic encoding not necessary)



Note:
  -   if the static types match (be it Raw or not) just set pointer.
  -   the compiler only needs to issue code to  check the dynamic type 
if the source is RawByteString.
  -   the dynamic type of the target is ignored by the compiler. Only 
the conversion function will use it.
  -   the static type of source and target is not used by the conversion 
library function. It can work according to the dynamic types and thus 
just needs to be given the two string variables (Pointers) in the call 
the compiler creates (in assembler object code).
  -   for a normal String, a mismatch between static and dynamic type 
(that would be erroneous in DXE as well) can't happen.
  -   for RawByteString, a "normal" dynamic type means: "this is 
printable information" and a dynamic type $FFFF (that had been assigned 
to the string when instantiating) means: this String just holds just 
bytes with no encoding assumed.

-Michael








More information about the fpc-devel mailing list