[fpc-devel] new string - question on usage

Hans-Peter Diettrich DrDiettrich1 at aol.com
Wed Oct 12 12:09:22 CEST 2011


Michael Schnell schrieb:

>> When I have a variable of type AnsiString, and assign an string to it, 
>> then its encoding is reported as 1252 (my system codepage). On Paul's 
>> machine it will have a different encoding, I assume?
>>
> Via personal consulting ( :) ) I learned that the multiple new Pascal - 
> string - types just are a kind of syntax-candy for an underlying common 
> dynamically typed (and functioning in that way) string type. Seemingly 
> when allocated theses strings get an appropriate encoding ID that is 
> effective even with a zero length.

The encoding is associated with string types, and every variable knows 
its type. I.e. we have a static encoding, associated with string types 
and variables, and a dynamic encoding of string data. Similar to the 
static and dynamic types of object references.

> Seemingly (other than I assumed) a " := " between new strings does not 
> preserve the encoding, but performs an encoding conversion to the 
> target's encoding ID.

Right. The encoding etc., as stored in the string header, is used while 
processing strings, e.g. in expressions. In the assignment to a variable 
the static encoding of that variable must be compared with the dynamic 
encoding of the string data, and a conversion must be performed whenever 
required.

> So for preventing a conversion, you need to make sure that the target 
> has the same (or a compatible) encoding ID as the source. (Either by 
> using the appropriate string types

Right, the new string types are *strict* types, declared as
   type UTF8String = type AnsiString(65001);
Note the second "type", denoting an new type, not an alias as in the old 
declaration of
   type UTF8String = AnsiString;

> (hoping the the encoding ID has not 
> been changed ) or by using SetCodePage.)

SetCodePage is applicable only to RawByteString, because this static 
type is compatible with all dynamic types - like TObject is compatible 
with all derived classes.

> I suppose there also is a 
> function that is done to do a "pure" code-ID preserving assignment.

Quite unlikely, this defeats the idea of static typing. Low-level 
hacking is possible, of course, but the effects are unpredictable. The 
compiler assumes that the dynamic encoding matches the static one, and 
generates according code.

> I suppose a variable of the type "String" is pre-loaded with the 
> predefined "System" encoding ID.

No, empty strings still are Nil pointers.

DoDi




More information about the fpc-devel mailing list