[fpc-pascal] Unicode file routines proposal
Marco van de Voort
marcov at stack.nl
Tue Jul 1 14:21:34 CEST 2008
> On Tue, Jul 1, 2008 at 9:02 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > A solution for unicode should be for everything, not just for UIs and
> > filenames. I should be able to carry data within it also, because otherwise
> > we are having this dicussion next week again if Joost needs unicode for DB
> > related issues etc.
>
> Ok, but how do you know that everyone wants to store data in the
> "system" encoding?
Well, euh, the main reason is that euh, most programs and data on the system uses
the system encoding?
> What if I want to store data using ansistring in Windows because my
> file is UTF-8?
Then I'd say you convert. But that is the point. The need for conversion should be
the exception (different from the default system encoding), not the rule.
> In my system I propose that simply a TWideStringList be implemented,
> so both ways of storing data are available everwhere.
But I don't have an utf-8 type in your system to operate on.
> > How? I can't express the foreign encoding because I have no type for it. I
> > only have ansistring that can mean pretty much everything, and that
> > constitutes no compiletime safety.
>
> ansistrings don't mean everything. They mean either ISO or utf-8.
Yes. Which is why there is a need for a separate UTF-8 type as well as the
UTF-16 type. So that the compiler knows for sure something is UTF-8, and can
insert conversions. And can error/hint/warn you to insert manual conversions if you
assign an unicode type (either) to an ansistring.
> >> I bet you would convert automatically from whatever to ansi when going
> >> to a ansistring, but Lazarus uses utf-8 in ansistrings.
> >
> > But that is lazarus specific.
>
> Lazarus is by far the largest project using Free Pascal?
FPC itself?
Anyway that doesn't matter. A solution for FPC must be carried broadly, not
just by lazarus.
> > Because the decision to put utf-8 in ansistrings is too fundamentally flawed
> > to implement such a thing, since it perfectly legal if an ansistring does
> > not contain utf8
>
> We concluded that utf-8 in ansistrings is a very convenient solution
> for us which works very well today. It provided a smooth migration
> path and keeps the vast majority of code working.
Because you had no choice.
More information about the fpc-pascal
mailing list