[fpc-pascal] Unicode file routines proposal
Marco van de Voort
marcov at stack.nl
Mon Jun 30 16:35:20 CEST 2008
> On Mon, Jun 30, 2008 at 10:32 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > It should be possible to work in the native encoding. One doesn't want to
> > wrap _every_ function in _every_ header with conversions procs.
>
> It is not possible to work with a ever changing encoding.
>
> MyLabel.Caption := 'Li??o';
>
> How would that ever work with a ever changing encoding? It would not.
Encoding in source is something totally different. This is '\u1232\u2314'
like syntax can be changed to utf8/16 by the compiler. In theory I think,
practice might be else.
> If you go to the real implementation level a changing encoding quickly
> becomes unmanagable.
That's why I don't believe the one string type two encoding helps. But if
fileexists is utf-8 on unix and utf-16 on windows, and any utf-16 or UTF-8
string that you pass from Lazarus is auto converted, what is the exact
problem? Everybody can maintain certain subsystems in a certain encoding,
but doesn't force that choice upon others.
> And what about the LFM files? In which encoding will they be?
The one you annotate in it? The loading code can decode both, since both
systems have both ?
> What if you develop a software in one system and tryes to build it in
> another?
What does that mean for the fully UTF-16 system? First you may start with
wrapping all C api's that use utf-8 on Unix.
I understand the simplicity of one encoding is appealing, but you have to
look at all aspects, and that is not just representation in the GUI.
It will mean that _every_ string transactie to the outside will have to be
manually wrapped AND have a performance penalty. That is a heavy price to
pay for not touching a bit of lfm loading code.
> Ok, to go one step further: Has anyone ever seen a fully unicode
> system which works with changing encodings? I beliave there exists
> none, because this is not a good solution.
How many systems do you know have datafiles of like .lfm's over system
borders?
> > Well, they will have to do that with one string type too, at every
> > external barrier.
>
> This is already necessary.
But if you properly type them, some conversions maybe automatic. Something
you don't have with a single type.
> > That also kills the benefit of choosing UTF-16 in the first place, since
> > Delphi code won't work on Unix without manually inserting a lot of
> > conversion code.
>
> Delphi code can use the ansi routines, which could just call the
> utf-16 routines with a string conversion, or you can implement every
> routine twice to maximize speed.
If the unicode code is not compatible with Delphi (UTF-16), there is no
point in using UTf-16 in the first place.
More information about the fpc-pascal
mailing list