[fpc-devel] String and UnicodeString and UTF8String
Michael Schnell
mschnell at lumino.de
Wed Jan 12 10:07:03 CET 2011
On 01/11/2011 05:50 PM, Hans-Peter Diettrich wrote:
>
> Since the generic Delphi "string" type can be any Unicode encoding now,
This
> From what O read I understand
> that the dynamically code string type can hold 1, 2, and 4 byte (maybe
> even more) Codes for it's elements (denoted in one control-value) and
> each of those (theoretically) in different coding schemes (denoted in
> another control-value), allowing e.g. for UTF-8, UTF-16, UCS4, German
> ANSI, raw Byte, string....
is what I (not owning a Delphi > 2007) thought, too, and have been
bashed for.
But The document "Delphi and Unicode" by Marco Cantu (
http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf
), dated Nov, 2008, in fact states:
length, the second element is the reference count. In Delphi 2009 the
representation for
reference-counted strings becomes:
-12 -10 -8 -4
String reference address
Code page Elem size Ref count length First char of string
Beside the length and reference count, the new fields represent the
element size and the code
page. While the element size is used to discriminate between AnsiString
and UnicodeString, the
code page makes sense in particular for the AnsiString type (as it works
in Delphi 2009), as the
UnicodeString type has the fixed code page 1200.
A corresponding support data structure is declared in the implementation
section of System unit as:
type
PStrRec = ^StrRec;
StrRec = packed record
codePage: Word;
elemSize: Word;
refCnt: Longint;
length: Longint;
end;
But maybe the document is outdated.
-Michael
More information about the fpc-devel
mailing list