[fpc-pascal] Delphi / FPC and UTF8 BOM

Jonas Maebe jonas.maebe at elis.ugent.be
Sat Oct 18 20:17:49 CEST 2008


On 18 Oct 2008, at 16:32, Zaher Dirkey wrote:

> I notice UTF8 from Delphi not Compatible with Lazarus/FPC and vise  
> versa.
> It Corrupt the characters.

When you set the encoding of an FPC source file to UTF-8 (either by  
adding a BOM or by using {$codepage utf-8}), then
a) all constant strings containing utf-8 characters will be decoded  
and converted to utf-16 (widestring)
b) at run time, these widestrings will again be converted to the  
active code page when assigning them to ansistrings or shortstrings

If a file does not contain a BOM nor a {$codepage yyy} statement, then  
constant strings are not parsed in anyway and will be appear literally  
in the compiled program (and when assigning them to a widestring, they  
will be "converted" from ansi to utf-16 and hence contain garbage at  
the end).

This means that if you (ab)use ansistrings to store utf-8 strings  
(rather than strings in the current ansi-encoding), you either have to
a) not use a bom/codepage, or
b) use UTF8Encode(widestringconstant)

Otherwise characters not representable using the active code page will  
disappear during the run time conversion from widestring to ansistring  
(and you won't end up with an utf-8 string in any case).


Jonas



More information about the fpc-pascal mailing list