[fpc-pascal] Convert codepages back to UTF8
Martok
listbox at martoks-place.de
Tue May 28 16:57:59 CEST 2019
> Although be advised that if your SystemCodePage is not a Unicode codepage, there
> will be data loss due to (sometimes unexpected) internal conversions, regardless
> of the current dynamic string code page.
>
>
> As Graeme wrote that shouldn't be the case when converting to UTF-8. And for
> everything else you need to either use string variables with the correct static
> encoding or RawByteString to avoid conversions.
As I wrote: "if your SystemCodePage is not a Unicode codepage". If it is,
everything mostly works.
And even RawByteString gets unexpected roundtrip-conversions on some operations,
which breaks in funny ways if the SystemCodePage can't represent some characters
in the RBS. I once spent most of a day debugging seemingly random data
corruption until I realized the corrupted bytes were #$81, #$90 etc and the
non-LCL program used CP 1252.
More interesting for Alexey regarding the followup question: the result of any
string operation is in the DefaultSystemCodepage, such as:
s:= 'abc';
SetCodePage(RawByteString(s), CP_UTF8, true);
WriteLn(s, ' ',StringCodePage(s)); // abc 65001
s:= s + 'd';
WriteLn(s, ' ',StringCodePage(s)); // abcd 1252
--
Regards,
Martok
More information about the fpc-pascal
mailing list