[fpc-pascal] Concatenating CP Strings
Jonas Maebe
jonas at freepascal.org
Sun Sep 16 14:31:06 CEST 2018
On 16/09/18 13:31, Martok wrote:
> Let's say the user directs a program to "treat this file as $codepage".
> Therefore, I need to read it as this codepage and fill internal data structures
> with strings in that codepage, while keeping other operations in the system
> codepage (so I can't just change DefaultSystemCodepage). Does that mean that
> there is no way to do this with native strings?
I can only second guess the ideas behind Embarcadero's introduction of
the codepage-aware ansistrings, but I think the main purpose was to make
it easier to convert existing code that was written for a particular
system codepage into a program that works with unicodestring.
Hence, the codepage-aware string functionality supports setting the
wanted code page at the input and output level, and everything in
between is expected to be performed using either unicodestring or
DefaultSystemCodePage. FPC slightly extended this so that the encoding
of system file names (and the code page returned by related routines)
can be specified differently, so that you can easily set
DefaultSystemCodePage to CP_UTF8 (or something else) regardless of what
codepage the system's APIs used by the RTL expect.
In general, a program will seldom have built-in support for analysing
and manipulating strings in every possible codepage in existence. The
general paradigm is to convert a string to a single encoding that is
used internally, perform analysis/processing using this incoding, and
then convert it back. In fact, that is what many runtime library
routines also do because most OS library functions only support a very
select number of codepages (or the OS library functions do it themselves
interanlly).
If you don't care about the codepage and won't perform any processing
that depends on the code page, then codepage-aware strings are probably
the wrong data structure. Arrays may be more appropriate.
Alternatively, you can set the codepage of your text file to
DefaultSystemCodePage and read a regular ansistring from it. You can
still force the code page of the string you read afterwards to something
else using SetStringCodePage() if you wish to use the equivalent of an
explicit typecast at the string codepage level. But indeed, this is not
a workflow that the codepage-aware strings support without extra work on
your part, and as explained above, I don't think it was ever intended to
be either.
>> TL;DR: "AnsiString"/"String" is a type that has the code page that was
>> determined at startup, not one that turns itself into whatever code page
>> gets thrown at it
> Actually, there is a String type that is just that (at least according to the
> wiki): RawByteString. Supposedly, it just accepts any dynamic codepage without
> conversion. But it doesn't work for either of the cases here?
RawByteString is something that is largely undocumented by Embarcadero.
I tried my best to make the behaviour as compatible as possible with
Delphi, but there are still bugs in it (and holes in my knowledge about
how exactly they are supposed to behave in all possible situations).
Jonas
More information about the fpc-pascal
mailing list