[fpc-pascal] Unicode filenames

Sun Jun 29 13:10:33 CEST 2008

> Marco van de Voort schreef:
> > What are the exact plans of Lazarus in this? Is there some wiki page with
> > how Lazarus plans to tackle this with all multi-platform concerns? 
> 
> All strings the LCL are UTF8, see 
> http://wiki.lazarus.freepascal.org/LCL_Unicode_Support
> 
> For windows this means all strings in the LCL are converted to ansi, if 
> the OS is win9X, and to widestring, if the OS is NT or higher.
> 
> See in particular: 
> http://wiki.lazarus.freepascal.org/LCL_Unicode_Support#Dealing_with_directory_and_filenames

> IMHO there should be a better solution than to convert file and 
> directory to/from ansi. I think, this is the archiles heel of the 
> Lazarus unicode support, that it depends on a RTL that has insufficient 
> unicode support.

Yes. And with the above question I meant more what you want long term, and
the reasons that we can't see (widgetset related, db components related),
not which hacks you employ now to workaround that :-)

If we want to make an informed decision, we have to put all requirements on
the table, e.g.

- Base principle for me: requiring too much handcoding is not desirable. In
  an application (not RTL/FCL) I don't want to have to insert manual
  conversions for each string operation and/or passing. Some of this must be
  automated, it is the delphi way IMHO.

- Which encoding(s) to support (utf-8 and/or utf-16 mostly)
- The and/or in the question above, one primal encoding or two? If you have
  just one, you have to convert on some OS to access API+widgetset, and header
  translated for that OS must be redone with gluecode doing the transforms
  If you have two, each general purpose string routine must be 
  doubly implemented. (or face conversion chaos)
- keep one windows release per windows target (win32,win64) or have two
  (ascii+w9x compat, unicode+NTonly)  ?
  This because if we significantly step unicode API use up, keeping runtime compatability
   with w9x will require a lot of hackish code. Split them up, and you just
   have a few unicode vs ansi includefiles, and way less glue code to make
   mistakes in. 
- Do we (longterm) let UTF8 piggy back on ansistring, or do we have a
  distinct type for it, so that the compiler knows that type is utf-8 (and
  consequently that ansistring isn't) ?  I've a feeling that this might be
  required, even if we decide on utf8 as universal encoding to avoid having
  to add too many checks and hand transformations.