[fpc-pascal] Delphi / FPC and UTF8 BOM
Jonas Maebe
jonas.maebe at elis.ugent.be
Sat Oct 18 20:17:49 CEST 2008
On 18 Oct 2008, at 16:32, Zaher Dirkey wrote:
> I notice UTF8 from Delphi not Compatible with Lazarus/FPC and vise
> versa.
> It Corrupt the characters.
When you set the encoding of an FPC source file to UTF-8 (either by
adding a BOM or by using {$codepage utf-8}), then
a) all constant strings containing utf-8 characters will be decoded
and converted to utf-16 (widestring)
b) at run time, these widestrings will again be converted to the
active code page when assigning them to ansistrings or shortstrings
If a file does not contain a BOM nor a {$codepage yyy} statement, then
constant strings are not parsed in anyway and will be appear literally
in the compiled program (and when assigning them to a widestring, they
will be "converted" from ansi to utf-16 and hence contain garbage at
the end).
This means that if you (ab)use ansistrings to store utf-8 strings
(rather than strings in the current ansi-encoding), you either have to
a) not use a bom/codepage, or
b) use UTF8Encode(widestringconstant)
Otherwise characters not representable using the active code page will
disappear during the run time conversion from widestring to ansistring
(and you won't end up with an utf-8 string in any case).
Jonas
More information about the fpc-pascal
mailing list