[fpc-devel] Unicodestring branch, please test and help fixing
Marco van de Voort
marcov at stack.nl
Wed Sep 10 15:02:51 CEST 2008
In our previous episode, Graeme Geldenhuys said:
> > The problem is how it applies to strings, and how they can be more
> > memory saving than a straight array of 16-bit values which are
> > copy-on-write.
>
> I think for a good code example of this, have a look at Java's
> Document class. It's not exactly what I'm talking about, but it's got
> the idea. The Document class forms the basic storing medium of all
> their text based components - from a simple TextEdit, TextArea to
> complex rich text documents. So it scales well.
How can you say that? The limit is if a person notices it, but a main string
type must also be used for serversystems that import a several GB database
export.
> Each character can have individual characteristics set. Storage down
> to character level. Similar to what I am suggesting with the Flyweight
> pattern - characters of a string with encoding information.
I can't see how you could stuff that in less than 16 bits? (since that would
be the storage now, and you said it would save memory)
> The Document class also uses an internal gapped buffer implementation
> to store it's content - apparently good for performance.
It is one of many ways to avoid big delays on big continuous documents.
E.g. Word uses (classcally) a different approach, where the document is a
set of references to paragraphs. That way you can swap entire paragraphs by
manipulating a few pointers.
It is also totally unrelated to stringhandling.
> Again something like this could be used in the "character pool" manager
> object - though I'm not 100% sure.
Which, what, where, why character pool manager object? How
> Please note, this is just a thought. I haven't written any Object
> Pascal code implementing something like this - to prove the concept. I
> simply know the Flyweight pattern and it seems to be a possible
> option.
And we are trying to get to the bottom of that feel.
Let me summarize this ENTIRE discussion up to know (this also goes also for
the other posters):
1a) objects -> good
1b) not object -> not good
2) flyweight pattern will be a good string type.
3) A "+" for string concatenation is frowned upon in good OOP circles.
4) The Java string type is an immutable object.
5) C++ _possibly_ has some problems effiectly coding s[x] using class string
types.
Which for practical relevance to the unicodestring type can be further
summarized to the empty set.
So in short: while I'm not entirely fond of an OOP approach to strings
(simply because I have never seen one that fits in with a language as
Delphi/FPC), I'm willing to hear the arguments.
But we are now several tens of posts in this subthread, and there has been
absolutely no information at all!
> Remember, Unicode support is much more that simply storing and
> displaying text
Displaying text is already pretty much out of the scope of the unicodestring
type that is the subject of this thread.
More information about the fpc-devel
mailing list