[fpc-pascal] Unicode file routines proposal

Martin Schreiber fpmse at bluewin.ch
Wed Jul 2 07:10:17 CEST 2008


On Tuesday 01 July 2008 22.23:12 Marc Weustink wrote:
> Martin Schreiber wrote:
> > On Tuesday 01 July 2008 18.32:30 Mattias Gärtner wrote:
> >>> In this routines length(widestring), widestring[index], pwidechar^,
> >>> pwidechar[index], pwidechar + offset, pwidechar - pwidechar and
> >>> inc(pwidechar)/dec(pwidechar) are used often. This can't be done with
> >>> utf-8 strings.
> >>
> >> Ehm, do you know, that UTF-8 has the advantage, that many ascii
> >> functions work without change?
> >> For example ReplaceChar or searching a substring?
> >
> > Sure, but for layout calculation and the like we need fast access to
> > codepoints.
>
> The only way to be sure is using utf-32 in this case. (or not supporting
> unicode)
>
I'd like to repeat:
We talk about the MSEgui framework here, not about FPC RTL or FCL.
In MSEgui we need fast internal string and character handling routines which 
support UCS-2. UCS-2 is enough even for our single active Chinese user I know 
of. I don't want to slow down MSEgui for 100% of the MSEgui users because of 
the theoretical possibility that someone needs code points which don't fit 
into the base plane. If someone needs the whole unicode range he can use 
surrogate pairs. They will not show correct on screen, but all other tasks 
can be done. It is the same situation as with ansistring/utf8string.
The use of 16bit instead of 8bit as storage base of the MSEgui string 
representation has the big advantage, that 100% of the MSEgui users can 
access characters by a simple linear index. Because MSEgui is mainly used by 
Russian speaking people, this would probably be less than 20% in case of 
8bit. Most of the European users wold be out of luck because of the umlauts 
and accents.
Another need of the MSEgui users and the MSEgui routines is converting 
internal string representation to the current 8bit system encoding. FPC 
supports this perfectly by the widestringmanager already.
Xlib and gdi both have a widestring interface. The only drawback I see is that 
there is no reference counted FPC widestring type in Windows at the moment.
The upcoming new Delphi version uses a simple reference counted widestring as 
string base type too AFAIK.
So if FPC decides to implement a referencecounted widestring on Windows for 
Delphi compatibility, it should be available in OBJFPC mode too.
Conclusion:
MSEgui, and propably most of the MSEgui users too, has no need for a multi 
encoding string type at the expense of slower code and more memory 
consumption, a referencecounted widestring on Windows would be enough.

Martin



More information about the fpc-pascal mailing list