[fpc-pascal] Unicode file routines proposal

Marco van de Voort marcov at stack.nl
Tue Jul 1 14:02:03 CEST 2008


> On Tue, Jul 1, 2008 at 4:23 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > Certainly. Can you imagine loading a non trivial file in a tstringlist and
> > saving it again and the heaps of conversions?
> 
> And how do you know that the file to be loaded will be in the system
> encoding? 

Not at all. that is the problem of the programmer, and depends on what he
knows (or can detect), it is not the systems choice.

Point is that if I only have UTF-16, I must convert to UTF-8 data coming in
manually, do some minor processing (like append a string to a tstringlist),
and then convert it back before writing. 

And I have to, since UTf16 is my only type.

> We should simply not do any conversion or any assumption when loading a
> file in a TStringList, so nothing changes here.

You have an assumption, since your TStringList would be UTF16 strings only.
 
> We are talking about strings like the filename in LoadFromFile, and
> not about the string to hold the contents. This would always be an
> ansistring and if someone needs to load a utf-16 file he needs to
> build a TWideStringList.

A solution for unicode should be for everything, not just for UIs and
filenames. I should be able to carry data within it also, because otherwise
we are having this dicussion next week again if Joost needs unicode for DB
related issues etc.
 
> > Moreover, there is an important reason missing:
> >
> > * Being able to declare the outside world in the right encoding, without
> >  manually inserting conversions in each header.
> 
> This has nothing to do with this. With a fixed encoding you can also
> have automatic conversions.

How? I can't express the foreign encoding because I have no type for it. I
only have ansistring that can mean pretty much everything, and that
constitutes no compiletime safety.
 
> I bet you would convert automatically from whatever to ansi when going
> to a ansistring, but Lazarus uses utf-8 in ansistrings.

But that is lazarus specific.
 
> We do manual conversions in Lazarus because FPC misses a solution for
> automatic conversion using utf-8 in ansistrings.

Because the decision to put utf-8 in ansistrings is too fundamentally flawed
to implement such a thing, since it perfectly legal if an ansistring does
not contain utf8



More information about the fpc-pascal mailing list