[fpc-devel] Unicode and UTF8String

Luiz Americo Pereira Camara pascalive at bol.com.br
Mon Dec 1 23:04:46 CET 2008

Marco van de Voort escreveu:
> In our previous episode, Luiz Americo Pereira Camara said:
>>>> string[index], copy, pos, length have always been part of Pascal.
>>> So keep using ansistring? It doesn't change.
>> Not true if fpc will follow Delphi. The new AnsiString type will be also 
>> automatically converted in Delphi 2009.
> As far as I know, the default is still "ascii in the default system ascii
> encoding".
>> See the Marco Cantu doc about 
>> Unicode (linked some threads ago).
> I got it from Alan Bauers blog in may (before Tiburon was out), but while
> ansistring changes, afaik the widestring to ansistring-without-qualifier
> stays the same?
The doc i was referring is: http://dn.codegear.com/article/38980

Some quotes:
"The change in the definition of the Char type is important because it 
is tied to the change in the
definition of the string type. Unlike characters, though, string is 
mapped to a brand new data
type that didn't exist before, called UnicodeString. As we'll see, its 
internal representation is
also quite different from that of the classic AnsiString type (I'm using 
the specific terms classic
AnsiString type, to refer to the string type as it used to work from 
Delphi 2 until Delphi 2007; the
AnsiString type is still part of Delphi 2009, but it has a modified 
behavior, so when referring its
past structure I'll use the term classic AnsiString)."

"AnsiString is a single-byte-per-character string type based on the 
current code page of the
operating system, closely matching the classic AnsiString of past 
versions of Delphi;"

"RawByteString is an array of characters with no code page set, on which 
no character
conversion is accomplished by the system (thus partially resembling the 
classic AnsiString, when
used as a pure character array)."

Differently from the past, the new AnsiType string carries one further 
piece of information, the
code page of the characters in the string. The DefaultSystemCodePage 
variable defaults
to CP_ACP, the current Windows code page, but it could be modified by 
calling the special
procedure, SetMultiByteConversionCodePage. You can do this to force an 
program to work (by default) with characters in a given code page (that 
the operating system
installation must support, of course).
In general, instead, you'd either stick to the current code page or 
change it for individual
strings, calling the SetCodePage procedure (introduced earlier while 
talking about characters
and code pages). This procedure can be called in two different ways. In 
the first case, you
change the code page of a string (maybe loaded by a separate file or 
socket) because you know
its format. In the second case, you can call it to convert a given 
string (something that happens
automatically when assigning a string to one of a different code page, 
as discussed later).

More information about the fpc-devel mailing list