[fpc-devel] new string - question on usage
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Wed Oct 12 12:09:22 CEST 2011
Michael Schnell schrieb:
>> When I have a variable of type AnsiString, and assign an string to it,
>> then its encoding is reported as 1252 (my system codepage). On Paul's
>> machine it will have a different encoding, I assume?
>>
> Via personal consulting ( :) ) I learned that the multiple new Pascal -
> string - types just are a kind of syntax-candy for an underlying common
> dynamically typed (and functioning in that way) string type. Seemingly
> when allocated theses strings get an appropriate encoding ID that is
> effective even with a zero length.
The encoding is associated with string types, and every variable knows
its type. I.e. we have a static encoding, associated with string types
and variables, and a dynamic encoding of string data. Similar to the
static and dynamic types of object references.
> Seemingly (other than I assumed) a " := " between new strings does not
> preserve the encoding, but performs an encoding conversion to the
> target's encoding ID.
Right. The encoding etc., as stored in the string header, is used while
processing strings, e.g. in expressions. In the assignment to a variable
the static encoding of that variable must be compared with the dynamic
encoding of the string data, and a conversion must be performed whenever
required.
> So for preventing a conversion, you need to make sure that the target
> has the same (or a compatible) encoding ID as the source. (Either by
> using the appropriate string types
Right, the new string types are *strict* types, declared as
type UTF8String = type AnsiString(65001);
Note the second "type", denoting an new type, not an alias as in the old
declaration of
type UTF8String = AnsiString;
> (hoping the the encoding ID has not
> been changed ) or by using SetCodePage.)
SetCodePage is applicable only to RawByteString, because this static
type is compatible with all dynamic types - like TObject is compatible
with all derived classes.
> I suppose there also is a
> function that is done to do a "pure" code-ID preserving assignment.
Quite unlikely, this defeats the idea of static typing. Low-level
hacking is possible, of course, but the effects are unpredictable. The
compiler assumes that the dynamic encoding matches the static one, and
generates according code.
> I suppose a variable of the type "String" is pre-loaded with the
> predefined "System" encoding ID.
No, empty strings still are Nil pointers.
DoDi
More information about the fpc-devel
mailing list