[fpc-devel] Unicode support (yet again)

Marco van de Voort marcov at stack.nl
Fri Sep 16 14:03:33 CEST 2011


In our previous episode, Tomas Hajny said:
>  .
> > In the UTF8 RTL, all "string"s _ARE_ utf8, unless specified otherwise (by
> > naming them unicodestring or ansistring(..encoding) or shortstrings).
> >
> > So the same virtual method with a STRING parameter will be TUnicodestring
> > in the UTF16 rtl and UTF8string in the utf8 RTL.
> 
> Sorry, one thing I'm missing in this point - where exactly is the indexed
> (SBCS codepage based) version in this if string always means either
> UnicodeString or UTF8String depending on the context / defines? Would
> there be no SBCS version any longer, or is this a third option, or what?

It is a third option but only maybe for a while on Windows and the only
option platforms that can't or won't support unicode.  (like Dos)

The idea is more or less that this trick can be employed for any
ansi/unicodestring type. The shorstring overloads are already there and
probably can stay.  So Dos or OS/2 are not in danger.

It also means three possible options for Windows. But ascii is temporary,
and Windows/utf8 is only for Lazarus.  (which I hope will see the light and
migrate to utf16 in time too)

> Was your point about "string", or "RTLString"?

I'm thinking about "string", but that is more directed towards the OOP
parts, which assume a objfpc{$h+} or Delphi mode. 

So the base RTL functions like fileopen will be rawbytestring that accepts
_all_ encodings, (so also ansi/utf8/utf16 in ansi/utf8/utf16 mode) and
runtime convert if necessary and possible or explicitely typed.  It depends
on the amount of routines that don't fall into these categories if something
like RTLSTRING is possible.

Of course the "accepting" of all encodings is on interface levels.
Implementations for platforms are dos are not supposed to support them all,
just the ones they always did.



More information about the fpc-devel mailing list