[fpc-devel] bug report 20473: Please add a directive to define string=utf8string

Marco van de Voort marcov at stack.nl
Thu Oct 13 11:57:00 CEST 2011


In our previous episode, Sven Barth said:
> I think he ment that if such a feature is introduced it would be a 
> natural conclusion to define "string = unicodestring" on Windows and 
> "string = utf8string" for Unix in the RTL and the FCL (and maybe "string 
> = ansistring" for DOS and OS/2). Thus those two libraries need to be 
> tested intensively that they can cope with that.

No, worse even, both on both platforms. So win32-utf8 and win32-utf16, where
the utf* designates the type of the default string type. And the same for
unix.  Maybe win32-ansi too.

Some of these are for the transition period and legacy only. I don't think
"ansi" versions for non windows platforms are necessary in the long run, and
maybe not even initially.  I haven't really made up my mind yet if the
1-byte target for unix should be utf8 specifically, or native encoding.

This means 2 RTLs for all major platforms (say
linux/freebsd/windows/darwin), and at least for a time an extra one (ansi,
compat with old FPC and Delphi) on windows.  The others are up to the
maintainer.

The default stringtype is set on compiler startup, but can be modified
inside procedures with $H like behaviour.

So first you pick the general encoding first, depending on your plans and
where your code originates from. The whole FPC distro will be compiled using
that type as much as possible. 

If you need escape from it, you can workaround it in two ways:

1. (best choice) change the routines that need escape to proper typing. preferably update
  the code to narrow the spot that is encoding dependent, and make the rest
   "string"
2. Try a blanket {$H} to make the default stringtype what you want, and fix
   problems (e.g. overrides of methods, passing to var params).


Some of the reasons for this:
- ability to use Delphi unicode code without changes. Forever, on all
  platforms.
- ability to use current lazarus code (cleaned up a bit to use 2.7.1 string
  types), by using utf8 rtls on all platforms)
- ability for lazarus to change opinions later (e.g. go to utf16 on windows)

Advantages:
- nobody gets left out. Saves a lot of discussion that will never fully die
  out.
- possibility to run a system in mostly one encoding, without micromanaging
  sctring types in the packages (RTL, classes, packages, lazarus + parts,
  external code) and finding some why to reconcile them on all systems.
- A scheme that is easy to understand and explain. "take the utf16 version,
  it is like D2009, period". No rewrite rules, minimal changes to the modes
  etc.
- Fully Delphi 2009 compatible options on a source level. 
- Fully Delphi-old compatible on windows only. (and unix too if we change
  1-byte to be default encoding) option
- the multiple choices give people options during the migration period.
   First pick the nearest (utf8 in Lazarus' case), then start testing
   and migrating to ideal situation.
- If problems turn up (and they will), there is a policy to deal with them,
   and deal with them properly. No workarounds, rewrites. IOW by separating
   this, problems that I can't overview now get a place. There are less
   chances on roadblocks and discussion points.

Problems:
1 obviously the release engineering effort.
2 Not all details fleshed out. Specially wrt modes. But this goes for all
   suggestions till now.
3 needs a lot of commitment from everybody. It is revolutionary, and not
evolutionary, and cannot be presented that way. We cannot pretend nothing
much changes now, and then assume everything works out, or we will find
workarounds (with a bit rewrite .....) lateron.

I think problem 3 is bigger than the other two.



More information about the fpc-devel mailing list