[fpc-devel] Unicode and UTF8String
Luiz Americo Pereira Camara
pascalive at bol.com.br
Mon Dec 1 23:04:46 CET 2008
Marco van de Voort escreveu:
> In our previous episode, Luiz Americo Pereira Camara said:
>>>> string[index], copy, pos, length have always been part of Pascal.
>>> So keep using ansistring? It doesn't change.
>> Not true if fpc will follow Delphi. The new AnsiString type will be also
>> automatically converted in Delphi 2009.
> As far as I know, the default is still "ascii in the default system ascii
>> See the Marco Cantu doc about
>> Unicode (linked some threads ago).
> I got it from Alan Bauers blog in may (before Tiburon was out), but while
> ansistring changes, afaik the widestring to ansistring-without-qualifier
> stays the same?
The doc i was referring is: http://dn.codegear.com/article/38980
"The change in the definition of the Char type is important because it
is tied to the change in the
definition of the string type. Unlike characters, though, string is
mapped to a brand new data
type that didn't exist before, called UnicodeString. As we'll see, its
internal representation is
also quite different from that of the classic AnsiString type (I'm using
the specific terms classic
AnsiString type, to refer to the string type as it used to work from
Delphi 2 until Delphi 2007; the
AnsiString type is still part of Delphi 2009, but it has a modified
behavior, so when referring its
past structure I'll use the term classic AnsiString)."
"AnsiString is a single-byte-per-character string type based on the
current code page of the
operating system, closely matching the classic AnsiString of past
versions of Delphi;"
"RawByteString is an array of characters with no code page set, on which
conversion is accomplished by the system (thus partially resembling the
classic AnsiString, when
used as a pure character array)."
THE NEW ANSISTRING TYPE
Differently from the past, the new AnsiType string carries one further
piece of information, the
code page of the characters in the string. The DefaultSystemCodePage
to CP_ACP, the current Windows code page, but it could be modified by
calling the special
procedure, SetMultiByteConversionCodePage. You can do this to force an
program to work (by default) with characters in a given code page (that
the operating system
installation must support, of course).
In general, instead, you'd either stick to the current code page or
change it for individual
strings, calling the SetCodePage procedure (introduced earlier while
talking about characters
and code pages). This procedure can be called in two different ways. In
the first case, you
change the code page of a string (maybe loaded by a separate file or
socket) because you know
its format. In the second case, you can call it to convert a given
string (something that happens
automatically when assigning a string to one of a different code page,
as discussed later).
More information about the fpc-devel