[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Sven Barth pascaldragon at googlemail.com
Wed Nov 26 15:05:16 CET 2014


Am 26.11.2014 12:37 schrieb "Michael Schnell" <mschnell at lumino.de>:
>
> On 11/26/2014 12:09 PM, Sven Barth wrote:
>>
>>  In Delphi (and FPC) CP_ACP corresponds by default with the current
system codepage (e.g. CP1252 on a German Windows).
>
>
> OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
String(CP1252) but different from String without brackets which in turn is
the same as String(CP_UTF16) ? Correct ?

There is no "String with brackets". You can only use "AnsiString" followed
by brackets, not "String". And "String" in Delphi 2009+ is the same as
UnicodeString which is a different compiler internal type than
AnsiString(CP_UTF16) would be if it would be allowed.

>
>> CP_UTF16 is not supported, because AnsiString only supports 1-Byte
character strings (and UTF-8 as the odd one) and not 2-Byte character
strings.
>
>
> I still don't understand. The wiki article seems to suggest that it is
about a type called "ANSIString" that features a dynamically settable "code
page information". From discussions about Delphi and FPC, I only know a
String type with a dynamically settable "code page information" that also
features a dynamically settable "Bytes per Character information" and hence
does support 1, 2 and 4 "Bytes per Character". (e.g. UTF-8, UTF-16, and
UTF-32).

While both AnsiString and UnicodeString have the current codepage and the
character size in their header record the code page is only used for
AnsiString and the size can not he influenced in any way (for an AnsiString
it's always 1 and for a UnicodeString it's always 2). There is no UTF-32
string (at least not in the sense of a compiler provided type).

>
>
>> The difference to Delphi currently is that for FPC
String=AnsiString(CP_ACP) and for Delphi String=UnicodeString (aka 2-Byte
string).
>>
>
> I understand that you mean (e.g.) Delphi XE. But what version of FPC is
"currently".

FPC is none, because when Delphi introduced the code page aware AnsiString
it switch at the same time from having String=AnsiString to
Stribgm=UnicodeString. FPC did only the first part for now (so at best FPC
would he a "not quite 2009" :P ).

> Am I wrong assuming that in the svn we do have the "NewStrings" library
that supports dynamical code-page *and* byte-per-character settings and
hence supports e.g. CP1251, UTF-8, UTF-16, and UTF-32 ? So I seem to
understand the meaning of String(CP1252), String(CP_UTF8), and
String(CP_UTF16) (which seems do be the Delphi notation), but I seemingly
don't get the exact meaning of "AnsiString(CP_ACP)" or "AnsiString(CP1251)"

No. The Delphi notation is the same as in FPC: AnsiString(codepage).
And a AnsiString(CP_1251) normally holds string data encoded with the
CP-1251 codepage while a AnsiString(CP_ACP) holds string data encoded with
whatever encoding the DefaultSystemCodePage denoted at the time of
assignment. This can be for example CP_1251 as well or something different
like CP_UTF8 (it can however not he CP_ACP again nor CP_UTF16 nor CP_UTF32).

> In the end, what the definition of "String" without brackets is, might be
due to a settable compiler option and/or the OS the compiler is set to
create code for.

That is already the case:

- any mode, H- : ShortString
- any mode except delphi_unicode, H+ : AnsiString(CP_ACP)
- mode delphi_unicode, H+ : UnicodeString
(there's also a modeswitch to change String to UnicodeString, but I forgot
its name -.-)
Please note that these switches are always per unit as precompiled units
(like the RTL ones) can not be influenced.

Regards,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20141126/9d43fbcd/attachment.html>


More information about the fpc-devel mailing list