[fpc-devel] Unicode support (yet again)

Graeme Geldenhuys graemeg.lists at gmail.com
Wed Sep 14 11:48:49 CEST 2011


On 14/09/2011 11:19, Luiz Americo Pereira Camara wrote:
> This is not desirable simply because at each platform (windows / unix) 
> the user code of the same program will have a different encoding 
> increasing the possibility of subtle errors.

Why? Not every program is a text manipulation program or text parser.
Most programs simply assign one string to another.

eg:

   Button1.Caption := 'Click me';
   lMyString := Button1.Caption;


Under unix systems 'Click me', Button1.Caption and lMyString will be a
UTF-8 encoded. Under Windows 'Click me', Button1.Caption and lMyString
will be UTF-16 encoding.

When Lazarus saves this information in a .lfm file, it will be stored as
UTF-8 irrespective of the platform. This is normal behaviour on all
platforms already, and already done in Lazarus too.

As for streaming, the same applies as for saving to file. UTF-8 is
ideally suited for (and was designed for simplifying) streaming, hence
the W3C promotes the usage of UTF-8 in HTML, XML etc.


> Another advantage of using RTLString as i proposed is that Lazarus will 
> require almost no code change since the encoding of string in LCL will 
> be the same (UTF8) across platforms.

Lazarus, like fpGUI will have to decide what they want to do. Stick to
having UTF-8 forced on all platforms, or use a native encoding on each
platform. Currently UTF-8 was choosen in both project because it is so
compatible (think easy here) with AnsiString - so least amount of work
was required and it was pretty efficient because most programs already
used AnsiString.

If I was to change fpGUI to use a native encoding on each platform, I
would simply change my definition of TfpgString as described in a
similar example before. All string manupulation inside fpGUI (and LCL)
should already have adhered to the rule that 1 byte <> 1 character, so
the rest of the framework should continue to work as normal. In the case
of fpGUI, I would also be able to get rid of all the UTF8Copy(),
UTF8Length() calls and simply use the RTL Copy() and Length() functions
again - after all, they were only introduced because FPC's RTL lacked
Unicode (any encoding) support.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/




More information about the fpc-devel mailing list