[fpc-pascal] Unicode file routines proposal

Mon Jun 30 16:35:20 CEST 2008

> On Mon, Jun 30, 2008 at 10:32 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > It should be possible to work in the native encoding. One doesn't want to
> > wrap _every_ function in _every_ header with conversions procs.
> 
> It is not possible to work with a ever changing encoding.
> 
> MyLabel.Caption := 'Li??o';
> 
> How would that ever work with a ever changing encoding? It would not.

Encoding in source is something totally different. This is '\u1232\u2314'
like syntax can be changed to utf8/16 by the compiler. In theory I think,
practice might be else.

> If you go to the real implementation level a changing encoding quickly
> becomes unmanagable.

That's why I don't believe the one string type two encoding helps. But if
fileexists is utf-8 on unix and utf-16 on windows, and any utf-16 or UTF-8
string that you pass from Lazarus is auto converted, what is the exact
problem? Everybody can maintain certain subsystems in a certain encoding,
but doesn't force that choice upon others.

> And what about the LFM files? In which encoding will they be?

The one you annotate in it? The loading code can decode both, since both
systems have both ?

> What if you develop a software in one system and tryes to build it in
> another?

What does that mean for the fully UTF-16 system? First you may start with
wrapping all C api's that use utf-8 on Unix. 

I understand the simplicity of one encoding is appealing, but you have to
look at all aspects, and that is not just representation in the GUI.

It will mean that _every_ string transactie to the outside will have to be
manually wrapped AND have a performance penalty. That is a heavy price to
pay for not touching a bit of lfm loading code.

> Ok, to go one step further: Has anyone ever seen a fully unicode
> system which works with changing encodings? I beliave there exists
> none, because this is not a good solution.

How many systems do you know have datafiles of like .lfm's over system
borders?

> > Well, they will have to do that with one string type too, at every
> > external barrier.
> 
> This is already necessary.

But if you properly type them, some conversions maybe automatic. Something
you don't have with a single type.

> > That also kills the benefit of choosing UTF-16 in the first place, since
> > Delphi code won't work on Unix without manually inserting a lot of
> > conversion code.
> 
> Delphi code can use the ansi routines, which could just call the
> utf-16 routines with a string conversion, or you can implement every
> routine twice to maximize speed.

If the unicode code is not compatible with Delphi (UTF-16), there is no
point in using UTf-16 in the first place.