[fpc-pascal] Unicode file routines proposal
Marco van de Voort
marcov at stack.nl
Tue Jul 1 14:02:03 CEST 2008
> On Tue, Jul 1, 2008 at 4:23 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > Certainly. Can you imagine loading a non trivial file in a tstringlist and
> > saving it again and the heaps of conversions?
>
> And how do you know that the file to be loaded will be in the system
> encoding?
Not at all. that is the problem of the programmer, and depends on what he
knows (or can detect), it is not the systems choice.
Point is that if I only have UTF-16, I must convert to UTF-8 data coming in
manually, do some minor processing (like append a string to a tstringlist),
and then convert it back before writing.
And I have to, since UTf16 is my only type.
> We should simply not do any conversion or any assumption when loading a
> file in a TStringList, so nothing changes here.
You have an assumption, since your TStringList would be UTF16 strings only.
> We are talking about strings like the filename in LoadFromFile, and
> not about the string to hold the contents. This would always be an
> ansistring and if someone needs to load a utf-16 file he needs to
> build a TWideStringList.
A solution for unicode should be for everything, not just for UIs and
filenames. I should be able to carry data within it also, because otherwise
we are having this dicussion next week again if Joost needs unicode for DB
related issues etc.
> > Moreover, there is an important reason missing:
> >
> > * Being able to declare the outside world in the right encoding, without
> > manually inserting conversions in each header.
>
> This has nothing to do with this. With a fixed encoding you can also
> have automatic conversions.
How? I can't express the foreign encoding because I have no type for it. I
only have ansistring that can mean pretty much everything, and that
constitutes no compiletime safety.
> I bet you would convert automatically from whatever to ansi when going
> to a ansistring, but Lazarus uses utf-8 in ansistrings.
But that is lazarus specific.
> We do manual conversions in Lazarus because FPC misses a solution for
> automatic conversion using utf-8 in ansistrings.
Because the decision to put utf-8 in ansistrings is too fundamentally flawed
to implement such a thing, since it perfectly legal if an ansistring does
not contain utf8
More information about the fpc-pascal
mailing list