[fpc-pascal] Unicode file routines proposal

Marco van de Voort marcov at stack.nl
Tue Jul 1 09:23:52 CEST 2008


> On Mon, Jun 30, 2008 at 11:35 AM, Marco van de Voort <marcov at stack.nl> wrote:
> > borders?
> 
> Gtk can load XML files, somewhat equivalent to our LFMs. They use
> UTF-8 everywhere.

GTK is unix centric on other systems. They don't have a firm leg in both the
Unix as the Windows world as we do. I can't judge the wxwidgets situation,
since I know nobody that uses it.
 
> Java is cross-platform and uses UTF-16 everywhere.

Java has to emulate everything (read: put up a barrier) from the outside
anyway, and not doing that is one of our fortes.

> multiple encodings:
> 
> * More complex
> * Innovative solution, no known example of a implementation of this
> system exists = uncertainty if it works at all, or if it is convenient
> for developers
> * Depends on a not yet implemented string type

Needs to be done anyway, since widestring on windows is COM, and that must
be also retained. So it is about adding 1 vs 2, and the work will be huge,
with UTF-16 too, and to make it worthwhile the best, not the quikest
solution should be sought.

> * Potentially will have a higher performance then a single encoding
> system, but only if you use this new special string type

Certainly. Can you imagine loading a non trivial file in a tstringlist and
saving it again and the heaps of conversions?

Moreover, there is an important reason missing:

* Being able to declare the outside world in the right encoding, without
  manually inserting conversions in each header.

* Does not make one of the two core platforms (Unix/windows) effectively
  second rate.

* Can be done phased, IOW in the beginning lots of conversion, but later 
  have more and more routines in the right encoding ready.

> Single encoding:
> 
> * Simple, proved solution

Simple solution, complex implementation (needs conversions anywhere). 

> * Does not need any new string type, can start being implemented immediately

It does. And you can start making UTF-16 routines anyway

> * Potentially has a lower performance due to string conversions.





More information about the fpc-pascal mailing list