[fpc-devel] Unicodestring branch, please test and help fixing
Michael Van Canneyt
michael at freepascal.org
Wed Sep 10 13:45:31 CEST 2008
On Wed, 10 Sep 2008, Graeme Geldenhuys wrote:
> On 9/10/08, Marco van de Voort <marcov at stack.nl> wrote:
> >
> > Like everybody, I have read GOF several times, and even got some of the
> > successor books.
>
> I don't think anybody has read GOF only once. :-)
>
>
> > The problem is how it applies to strings, and how they can be more
> > memory saving than a straight array of 16-bit values which are
> > copy-on-write.
>
> I think for a good code example of this, have a look at Java's
> Document class. It's not exactly what I'm talking about, but it's got
> the idea. The Document class forms the basic storing medium of all
> their text based components - from a simple TextEdit, TextArea to
> complex rich text documents. So it scales well.
>
> Each character can have individual characteristics set. Storage down
> to character level. Similar to what I am suggesting with the Flyweight
> pattern - characters of a string with encoding information.
>
> The Document class also uses an internal gapped buffer implementation
> to store it's content - apparently good for performance. Again
> something like this could be used in the "character pool" manager
> object - though I'm not 100% sure.
>
>
> Please note, this is just a thought. I haven't written any Object
> Pascal code implementing something like this - to prove the concept. I
> simply know the Flyweight pattern and it seems to be a possible
> option.
>
> Remember, Unicode support is much more that simply storing and
> displaying text. You have various encodings, RTL or LTR direction etc.
> I can't see how a simple type can keep track of all such information
> - but then, I don't know the internals of FPC either. ;-)
You are mixing 2 things. There is the actual string content, and there is the
string metadata. The metadata is something that would apply for flyweight
pattern. There is nothing to be gained by putting the metadata in an object,
since there is only the encoding. Storing the encoding in an object is
ridiculous and a waste of heap space. a 2 byte encoding is less wasteful
than a 4 or 8 byte object pointer.
The main problem with the GOF book is that
"If your only tool is a hammer, you tend to think of every problem as a nail."
Objects are not the nec-plus-ultra of programming. They are useful in a
very broad area, but not everything should be done in Objects, because
they do give overhead.
Strings are such a case where objects are simply too cumbersome.
Michael.
More information about the fpc-devel
mailing list