[fpc-pascal] Unicode file routines proposal

Mattias Gaertner nc-gaertnma at netcologne.de
Tue Jul 1 10:15:44 CEST 2008


On Tue, 1 Jul 2008 09:23:52 +0200 (CEST)
marcov at stack.nl (Marco van de Voort) wrote:

>[...]
> > multiple encodings:

Are we talking about one encoding per platform or two encodings for
all platforms?
Under Unix the encoding preference is clear: UTF-8.
Under Windows there are a lot of current code page texts and the
UTF-16 W functions. So, what encoding is the preference under windows?
UTF-16 plus Ansi like the A and W functions?


> > * More complex
> > * Innovative solution, no known example of a implementation of this
> > system exists = uncertainty if it works at all, or if it is
> > convenient for developers
> > * Depends on a not yet implemented string type
> 
> Needs to be done anyway, since widestring on windows is COM, and that
> must be also retained. So it is about adding 1 vs 2, and the work
> will be huge, with UTF-16 too, and to make it worthwhile the best,
> not the quikest solution should be sought.
> 
> > * Potentially will have a higher performance then a single encoding
> > system, but only if you use this new special string type
> 
> Certainly. Can you imagine loading a non trivial file in a
> tstringlist and saving it again and the heaps of conversions?

Auto conversion of the strings in a TStringList does not make much
sense (and will break a lot of code). That's why I propose to keep one
default string type. If almost everything uses one string type, then no
conversion will take place. 

I think the main problem is that the RTL calls the Ansi functions
under windows. Maybe we should not loose the focus.

 
> Moreover, there is an important reason missing:
> 
> * Being able to declare the outside world in the right encoding,
> without manually inserting conversions in each header.
> 
> * Does not make one of the two core platforms (Unix/windows)
> effectively second rate.

Windows need per se at least two encodings. So whatever is decided, the
windows part need some more work.

 
> * Can be done phased, IOW in the beginning lots of conversion, but
> later have more and more routines in the right encoding ready.
> 
> > Single encoding:
> > 
> > * Simple, proved solution
> 
> Simple solution, complex implementation (needs conversions anywhere). 
> 
> > * Does not need any new string type, can start being implemented
> > immediately
> 
> It does. And you can start making UTF-16 routines anyway
> 
> > * Potentially has a lower performance due to string conversions.


Mattias



More information about the fpc-pascal mailing list