[fpc-pascal] Unicode file routines proposal

Marco van de Voort marcov at stack.nl
Tue Jul 1 14:21:34 CEST 2008


> On Tue, Jul 1, 2008 at 9:02 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > A solution for unicode should be for everything, not just for UIs and
> > filenames. I should be able to carry data within it also, because otherwise
> > we are having this dicussion next week again if Joost needs unicode for DB
> > related issues etc.
> 
> Ok, but how do you know that everyone wants to store data in the
> "system" encoding?

Well, euh, the main reason is that euh, most programs and data on the system uses
the system encoding? 

> What if I want to store data using ansistring in Windows because my
> file is UTF-8?

Then I'd say you convert. But that is the point. The need for conversion should be
the exception (different from the default system encoding), not the rule.
 
> In my system I propose that simply a TWideStringList be implemented,
> so both ways of storing data are available everwhere.

But I don't have an utf-8 type in your system to operate on.

> > How? I can't express the foreign encoding because I have no type for it. I
> > only have ansistring that can mean pretty much everything, and that
> > constitutes no compiletime safety.
> 
> ansistrings don't mean everything. They mean either ISO or utf-8.

Yes. Which is why there is a need for a separate UTF-8 type as well as the
UTF-16 type. So that the compiler knows for sure something is UTF-8, and can
insert conversions. And can error/hint/warn you to insert manual conversions if you
assign an unicode type (either) to an ansistring.

> >> I bet you would convert automatically from whatever to ansi when going
> >> to a ansistring, but Lazarus uses utf-8 in ansistrings.
> >
> > But that is lazarus specific.
> 
> Lazarus is by far the largest project using Free Pascal?

FPC itself?

Anyway that doesn't matter. A solution for FPC must be carried broadly, not
just by lazarus.
 
> > Because the decision to put utf-8 in ansistrings is too fundamentally flawed
> > to implement such a thing, since it perfectly legal if an ansistring does
> > not contain utf8
> 
> We concluded that utf-8 in ansistrings is a very convenient solution
> for us which works very well today. It provided a smooth migration
> path and keeps the vast majority of code working.

Because you had no choice.
 



More information about the fpc-pascal mailing list