[fpc-pascal] Unicode file routines proposal
Martin Schreiber
fpmse at bluewin.ch
Tue Jul 1 10:23:32 CEST 2008
On Tuesday 01 July 2008 09.56:29 Mattias Gaertner wrote:
> On Tue, 01 Jul 2008 09:35:35 +0200
>
> Luca Olivetti <luca at ventoso.org> wrote:
> > OTOH using variable length characters will make string operations
> > expensive (since you can't just multiply the index by 2 or 4 but you
> > have to examine the string from the beginning, and the length in
> > bytes isn't the same as the length in characters).
>
> It's amazing that this argument come up again and again. But I know
> hardly any code that need this index to char mapping. And the code,
> that need it is seldom time critical.
> (I must admit, I feared the same some years ago. But the extra cost is
> practically a myth.)
>
A good example is text layout calculation where it is necessary to iterate
over characters (glyphs) over and over again. MSEgui uses widestrings
directly, fpGUI converts to widestrings before processing (or use they the
slow utf-8 routines ?). I once switched MSEgui to utf-8 because of the
widestring problems in FPC, one or two months later when I implemented
complex layout calculation with tabulators and justified text I switched back
to widestrings...
This belongs to a GUI framework, for a RTL are possibly other priorities.
>
> Most code only needs the number of bytes. And this still cost under
> pascal O(1).
> In fact if a UTF8String or UTF16String would be added, then I would
> say, it would be a waste of memory to store an extra PtrInt for the
> number of characters.
>
Agreed.
I think the best compromise for a GUI framework are referencecounted
widestrings where normally physical index = code point index. If one needs
characters which are not in the base plane, he must use surrogate pairs and
more complicated and slower processing. I assume this will be seldom used.
Martin
More information about the fpc-pascal
mailing list