[fpc-pascal] Delphi / FPC and UTF8 BOM

Marco van de Voort marcov at stack.nl
Wed Oct 22 12:33:59 CEST 2008


In our previous episode, Jonas Maebe said:
> > There has been a lot of discussion about this problem. What happens is
> > that FPC wishes to always have ansistrings holding system locale
> > encoded strings, it's impossible to have strings which store utf-8
> > data as far as FPC is concerned.
> 
> And the reason is that
> a) if you mix system and non-system encodings in ansistrings, then a  
> bunch of string conversions between ansistrings and widestrings will  
> go horribly wrong
> b) if you only use a particular non-system encoding for ansistrings,  
> then interfacing with OS routines will break down completely
> 
> It is possible to solve b) by manually adding necessary extra string  
> conversions everywhere in the RTL where ansistrings are passed to OS  
> routines, but that is a lot of work (both to implement and to  
> maintain) and very error prone. Then it's indeed much cleaner to  
> simply introduce a new string type which does not have to be  
> compatible with the OS encoding.

The solution of Tiburon is the same as Florian's original solution for the
multi unicode string type TUnicodeString (that now is still UTF16 only): add
an encoding field to ansistring, and alter ansistring declaration with an
encoding type:

Type  
 TUtf8String = ansistring (cp_UTF8);
   
This way you can explicitely flag anything internal as UTF-8, and
communicate with the outside 1-byte world using the native codepage (which
might be UTF-8 too, if desired)

The solution has Windows written all over it (including viewer UTF-8 as a
codepage), but it has merits IMHO.



More information about the fpc-pascal mailing list