[fpc-pascal] Unicode file routines proposal

Wed Jul 2 11:08:31 CEST 2008

Zitat von Martin Schreiber <fpmse at bluewin.ch>:

> On Wednesday 02 July 2008 09.32:17 Mattias Gaertner wrote:
> > On Tue, 1 Jul 2008 18:55:44 +0200
> >
> > Martin Schreiber <fpmse at bluewin.ch> wrote:
> > > On Tuesday 01 July 2008 18.32:30 Mattias GÃ¤rtner wrote:
> > > > > In this routines length(widestring), widestring[index],
> > > > > pwidechar^, pwidechar[index], pwidechar + offset, pwidechar -
> > > > > pwidechar and inc(pwidechar)/dec(pwidechar) are used often. This
> > > > > can't be done with utf-8 strings.
> > > >
> > > > Ehm, do you know, that UTF-8 has the advantage, that many ascii
> > > > functions work without change?
> > > > For example ReplaceChar or searching a substring?
> > >
> > > Sure, but for layout calculation and the like we need fast access to
> > > codepoints.
> >
> > Can you point me to an example function, where this is critical?
> >
> For example lib/common/kernel/msedrawtext.pas:223, procedure layouttext.

Nice code.
As far as I can see, it handles tabs, linebreaks, c_softhyphen and charwidth. It
uses single array element per character optimizations, like the charwidths
array.
I think I would simply keep that and define the characterwidth for the follow up
elements as 0.
Then you only need to change the places where you check for the c_softhyphen.
And because this is a constant you can even use some tricks here.
I don't see how this have a big impact on the performance.

Mattias