[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Wed Nov 26 20:17:14 CET 2014
Michael Schnell schrieb:
> On 11/26/2014 12:09 PM, Sven Barth wrote:
>> In Delphi (and FPC) CP_ACP corresponds by default with the current
>> system codepage (e.g. CP1252 on a German Windows).
>
> OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as
> String(CP1252) but different from String without brackets which in turn
> is the same as String(CP_UTF16) ? Correct ?
CP_ACP (and CP_NONE) describes a *static* encoding, and has an fixed
value (CP_ACP=0, CP_NONE=$FFFF). The dynamic encoding of strings, kept
in AnsiString(0) or RawByteString variables, must be obtained from the
string itself. When the string is empty, StringCodepage returns
DefaultSystemCodePage (for CP_ACP).
>> CP_UTF16 is not supported, because AnsiString only supports 1-Byte
>> character strings (and UTF-8 as the odd one) and not 2-Byte character
>> strings.
>
> I still don't understand. The wiki article seems to suggest that it is
> about a type called "ANSIString" that features a dynamically settable
> "code page information". From discussions about Delphi and FPC, I only
> know a String type with a dynamically settable "code page information"
> that also features a dynamically settable "Bytes per Character
> information" and hence does support 1, 2 and 4 "Bytes per Character".
> (e.g. UTF-8, UTF-16, and UTF-32).
You should have noticed that there exists no String or Char type, that
would allow for arbitrary bytes/char counts (see my other answer for
details).
>> The difference to Delphi currently is that for FPC
>> String=AnsiString(CP_ACP) and for Delphi String=UnicodeString (aka
>> 2-Byte string).
>>
>
> I understand that you mean (e.g.) Delphi XE. But what version of FPC is
> "currently". Am I wrong assuming that in the svn we do have the
> "NewStrings" library that supports dynamical code-page *and*
> byte-per-character settings and hence supports e.g. CP1251, UTF-8,
> UTF-16, and UTF-32 ?
The byte-per-character field is read-only, just like for any dynamic array.
> So I seem to understand the meaning of
> String(CP1252), String(CP_UTF8), and String(CP_UTF16) (which seems do be
> the Delphi notation), but I seemingly don't get the exact meaning of
> "AnsiString(CP_ACP)" or "AnsiString(CP1251)"
The Delphi notation is the same, e.g. AnsiString(CP_ACP).
> In the end, what the definition of "String" without brackets is, might
> be due to a settable compiler option and/or the OS the compiler is set
> to create code for.
Right, the *generic* String type can be mapped to either ShortString,
AnsiString(0) or UnicodeString, depending on compiler versions and
switches. A raw guess can be derived from sizeof(Char).
DoDi
More information about the fpc-devel
mailing list