[fpc-pascal] String theory

Jürgen Hestermann juergen.hestermann at gmx.de
Tue May 10 18:35:00 CEST 2016



Am 2016-05-10 um 17:48 schrieb Tony Whyman:
> I don't think this is what I meant as StringCodePage is a unicode string function. I am looking at the single byte string types.
>
> On 10/05/16 14:15, Bart wrote:
>> It already is [part of the string type.
>> See the StringCodePage function.
>

Codepages are not restricted to Unicode.
They can be others too (although it should only be used if unicode is no option for some reason).
Ansistring is single byte and can contain non-unicode codepages.
From
http://wiki.freepascal.org/FPC_Unicode_support
:


      -----------------------------------------------------------------------------------------------------------
      Shortstring

The code page of a shortstring is implicitly CP_ACP and hence will always be equal to the current value of DefaultSystemCodePage.


      PAnsiChar/AnsiChar

These types are the same as the old PChar/Char types. In all compiler modes except for /{$mode delphiunicode}/, PChar/Char are also still aliases for PAnsiChar/AnsiChar. Their code page is implicitly CP_ACP and hence will always be equal to the current value of DefaultSystemCodePage.


      PWideChar/PUnicodeChar and WideChar/UnicodeChar

These types remain unchanged. WideChar/UnicodeChar can contain a single UTF-16 code unit, while PWideChar/PUnicodeChar point to a single or an array of UTF-16 code units.

In /{$mode delphiunicode}/, PChar becomes an alias for PWideChar/PUnicodeChar and Char becomes an alias for WideChar/UnicodeChar.


      UnicodeString/WideString

These types behave the same as in previous versions:

  * /Widestring/ is the same as a "COM BSTR" on Windows, and an alias for UnicodeString on all other platforms. Its string data is encoded using UTF-16.
  * /UnicodeString/ is a reference-counted string with a maximum length of high(SizeInt) UTF-16 code units.


      Ansistring

AnsiStrings are reference-counted types with a maximum length of high(SizeInt) bytes. Additionally, they now also have code page information associated with them.

The most important thing to understand about the new AnsiString type is that it both has a declared/static/preferred/default code page (called /declared code page/ from now on), and a dynamic code page. The declared code page tells the compiler that when assigning something to that AnsiString, it should first convert the data to that declared code page (except if it is CP_NONE, see RawByteString <http://wiki.freepascal.org/FPC_Unicode_support#RawByteString> below). The dynamic code page is a property of the AnsiString which, similar to the length and the reference count, defines the actual code page of the data currently held by that AnsiString.
-----------------------------------------------------------------------------------------------------------

with

CP_ACP: this value represents the currently set "default system code page". See #Code page settings <http://wiki.freepascal.org/FPC_Unicode_support#Code_page_settings> for more information.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20160510/02ae9144/attachment.html>


More information about the fpc-pascal mailing list