[fpc-devel] String and UnicodeString and UTF8String

Michael Schnell mschnell at lumino.de
Wed Jan 12 10:07:03 CET 2011


On 01/11/2011 05:50 PM, Hans-Peter Diettrich wrote:
>
> Since the generic Delphi "string" type can be any Unicode encoding now,

This
>  From what O read I understand
> that the dynamically code string type can hold 1, 2, and 4 byte (maybe
> even more) Codes for it's elements (denoted in one control-value) and
> each of those (theoretically) in different coding schemes (denoted in
> another control-value), allowing e.g. for UTF-8, UTF-16, UCS4, German
> ANSI, raw Byte, string....

  is what I (not owning a Delphi > 2007) thought, too, and have been 
bashed for.

But The document "Delphi and Unicode" by Marco Cantu ( 
http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf 
), dated Nov, 2008, in fact states:

length, the second element is the reference count. In Delphi 2009 the 
representation for
reference-counted strings becomes:

-12                -10             -8                -4            
String reference address
Code page    Elem size    Ref count    length       First char of string

Beside the length and reference count, the new fields represent the 
element size and the code
page. While the element size is used to discriminate between AnsiString 
and UnicodeString, the
code page makes sense in particular for the AnsiString type (as it works 
in Delphi 2009), as the
UnicodeString type has the fixed code page 1200.
A corresponding support data structure is declared in the implementation 
section of System unit as:
type
   PStrRec = ^StrRec;
     StrRec = packed record
     codePage: Word;
     elemSize: Word;
     refCnt: Longint;
     length: Longint;
   end;

But maybe the document is outdated.

-Michael




More information about the fpc-devel mailing list