[fpc-pascal] Concatenating CP Strings

Jonas Maebe jonas at freepascal.org
Sun Sep 16 14:31:06 CEST 2018


On 16/09/18 13:31, Martok wrote:
> Let's say the user directs a program to "treat this file as $codepage".
> Therefore, I need to read it as this codepage and fill internal data structures
> with strings in that codepage, while keeping other operations in the system
> codepage (so I can't just change DefaultSystemCodepage). Does that mean that
> there is no way to do this with native strings?

I can only second guess the ideas behind Embarcadero's introduction of 
the codepage-aware ansistrings, but I think the main purpose was to make 
it easier to convert existing code that was written for a particular 
system codepage into a program that works with unicodestring.

Hence, the codepage-aware string functionality supports setting the 
wanted code page at the input and output level, and everything in 
between is expected to be performed using either unicodestring or 
DefaultSystemCodePage. FPC slightly extended this so that the encoding 
of system file names (and the code page returned by related routines) 
can be specified differently, so that you can easily set 
DefaultSystemCodePage to CP_UTF8 (or something else) regardless of what 
codepage the system's APIs used by the RTL expect.

In general, a program will seldom have built-in support for analysing 
and manipulating strings in every possible codepage in existence. The 
general paradigm is to convert a string to a single encoding that is 
used internally, perform analysis/processing using this incoding, and 
then convert it back. In fact, that is what many runtime library 
routines also do because most OS library functions only support a very 
select number of codepages (or the OS library functions do it themselves 
interanlly).

If you don't care about the codepage and won't perform any processing 
that depends on the code page, then codepage-aware strings are probably 
the wrong data structure. Arrays may be more appropriate.

Alternatively, you can set the codepage of your text file to 
DefaultSystemCodePage and read a regular ansistring from it. You can 
still force the code page of the string you read afterwards to something 
else using SetStringCodePage() if you wish to use the equivalent of an 
explicit typecast at the string codepage level. But indeed, this is not 
a workflow that the codepage-aware strings support without extra work on 
your part, and as explained above, I don't think it was ever intended to 
be either.

>> TL;DR: "AnsiString"/"String" is a type that has the code page that was
>> determined at startup, not one that turns itself into whatever code page
>> gets thrown at it
> Actually, there is a String type that is just that (at least according to the
> wiki): RawByteString. Supposedly, it just accepts any dynamic codepage without
> conversion. But it doesn't work for either of the cases here?

RawByteString is something that is largely undocumented by Embarcadero. 
I tried my best to make the behaviour as compatible as possible with 
Delphi, but there are still bugs in it (and holes in my knowledge about 
how exactly they are supposed to behave in all possible situations).


Jonas



More information about the fpc-pascal mailing list