[fpc-pascal] Delphi / FPC and UTF8 BOM
Marco van de Voort
marcov at stack.nl
Wed Oct 22 12:33:59 CEST 2008
In our previous episode, Jonas Maebe said:
> > There has been a lot of discussion about this problem. What happens is
> > that FPC wishes to always have ansistrings holding system locale
> > encoded strings, it's impossible to have strings which store utf-8
> > data as far as FPC is concerned.
>
> And the reason is that
> a) if you mix system and non-system encodings in ansistrings, then a
> bunch of string conversions between ansistrings and widestrings will
> go horribly wrong
> b) if you only use a particular non-system encoding for ansistrings,
> then interfacing with OS routines will break down completely
>
> It is possible to solve b) by manually adding necessary extra string
> conversions everywhere in the RTL where ansistrings are passed to OS
> routines, but that is a lot of work (both to implement and to
> maintain) and very error prone. Then it's indeed much cleaner to
> simply introduce a new string type which does not have to be
> compatible with the OS encoding.
The solution of Tiburon is the same as Florian's original solution for the
multi unicode string type TUnicodeString (that now is still UTF16 only): add
an encoding field to ansistring, and alter ansistring declaration with an
encoding type:
Type
TUtf8String = ansistring (cp_UTF8);
This way you can explicitely flag anything internal as UTF-8, and
communicate with the outside 1-byte world using the native codepage (which
might be UTF-8 too, if desired)
The solution has Windows written all over it (including viewer UTF-8 as a
codepage), but it has merits IMHO.
More information about the fpc-pascal
mailing list