[fpc-pascal] Convert codepages back to UTF8

Martok listbox at martoks-place.de
Tue May 28 16:57:59 CEST 2019


>     Although be advised that if your SystemCodePage is not a Unicode codepage, there
>     will be data loss due to (sometimes unexpected) internal conversions, regardless
>     of the current dynamic string code page.
> 
> 
> As Graeme wrote that shouldn't be the case when converting to UTF-8. And for
> everything else you need to either use string variables with the correct static
> encoding or RawByteString to avoid conversions. 

As I wrote: "if your SystemCodePage is not a Unicode codepage". If it is,
everything mostly works.
And even RawByteString gets unexpected roundtrip-conversions on some operations,
which breaks in funny ways if the SystemCodePage can't represent some characters
in the RBS. I once spent most of a day debugging seemingly random data
corruption until I realized the corrupted bytes were #$81, #$90 etc and the
non-LCL program used CP 1252.

More interesting for Alexey regarding the followup question: the result of any
string operation is in the DefaultSystemCodepage, such as:

  s:= 'abc';
  SetCodePage(RawByteString(s), CP_UTF8, true);
  WriteLn(s, ' ',StringCodePage(s));                   // abc 65001
  s:= s + 'd';
  WriteLn(s, ' ',StringCodePage(s));                   // abcd 1252


--
Regards,
Martok




More information about the fpc-pascal mailing list