[fpc-pascal] Unicode file routines proposal
Martin Schreiber
fpmse at bluewin.ch
Wed Jul 2 07:10:17 CEST 2008
On Tuesday 01 July 2008 22.23:12 Marc Weustink wrote:
> Martin Schreiber wrote:
> > On Tuesday 01 July 2008 18.32:30 Mattias Gärtner wrote:
> >>> In this routines length(widestring), widestring[index], pwidechar^,
> >>> pwidechar[index], pwidechar + offset, pwidechar - pwidechar and
> >>> inc(pwidechar)/dec(pwidechar) are used often. This can't be done with
> >>> utf-8 strings.
> >>
> >> Ehm, do you know, that UTF-8 has the advantage, that many ascii
> >> functions work without change?
> >> For example ReplaceChar or searching a substring?
> >
> > Sure, but for layout calculation and the like we need fast access to
> > codepoints.
>
> The only way to be sure is using utf-32 in this case. (or not supporting
> unicode)
>
I'd like to repeat:
We talk about the MSEgui framework here, not about FPC RTL or FCL.
In MSEgui we need fast internal string and character handling routines which
support UCS-2. UCS-2 is enough even for our single active Chinese user I know
of. I don't want to slow down MSEgui for 100% of the MSEgui users because of
the theoretical possibility that someone needs code points which don't fit
into the base plane. If someone needs the whole unicode range he can use
surrogate pairs. They will not show correct on screen, but all other tasks
can be done. It is the same situation as with ansistring/utf8string.
The use of 16bit instead of 8bit as storage base of the MSEgui string
representation has the big advantage, that 100% of the MSEgui users can
access characters by a simple linear index. Because MSEgui is mainly used by
Russian speaking people, this would probably be less than 20% in case of
8bit. Most of the European users wold be out of luck because of the umlauts
and accents.
Another need of the MSEgui users and the MSEgui routines is converting
internal string representation to the current 8bit system encoding. FPC
supports this perfectly by the widestringmanager already.
Xlib and gdi both have a widestring interface. The only drawback I see is that
there is no reference counted FPC widestring type in Windows at the moment.
The upcoming new Delphi version uses a simple reference counted widestring as
string base type too AFAIK.
So if FPC decides to implement a referencecounted widestring on Windows for
Delphi compatibility, it should be available in OBJFPC mode too.
Conclusion:
MSEgui, and propably most of the MSEgui users too, has no need for a multi
encoding string type at the expense of slower code and more memory
consumption, a referencecounted widestring on Windows would be enough.
Martin
More information about the fpc-pascal
mailing list