[fpc-devel] Unicodestring branch, please test and help fixing

Marco van de Voort marcov at stack.nl
Wed Sep 10 15:02:51 CEST 2008


In our previous episode, Graeme Geldenhuys said:
> >  The problem is how it applies to strings, and how they can be more
> >  memory saving than a straight array of 16-bit values which are
> >  copy-on-write.
> 
> I think for a good code example of this, have a look at Java's
> Document class. It's not exactly what I'm talking about, but it's got
> the idea. The Document class forms the basic storing medium of all
> their text based components - from a simple TextEdit, TextArea to
> complex rich text documents. So it scales well.

How can you say that? The limit is if a person notices it, but a main string
type must also be used for serversystems that import a several GB database
export.

> Each character can have individual characteristics set. Storage down
> to character level. Similar to what I am suggesting with the Flyweight
> pattern - characters of a string with encoding information.

I can't see how you could stuff that in less than 16 bits? (since that would
be the storage now, and you said it would save memory) 
 
> The Document class also uses an internal gapped buffer implementation
> to store it's content - apparently good for performance. 

It is one of many ways to avoid big delays on big continuous documents.

E.g. Word uses (classcally) a different approach, where the document is a
set of references to paragraphs. That way you can swap entire paragraphs by
manipulating a few pointers.

It is also totally unrelated to stringhandling.

> Again something like this could be used in the "character pool" manager
> object - though I'm not 100% sure.

Which, what, where, why character pool manager object? How 

> Please note, this is just a thought. I haven't written any Object
> Pascal code implementing something like this - to prove the concept. I
> simply know the Flyweight pattern and it seems to be a possible
> option.

And we are trying to get to the bottom of that feel.

Let me summarize this ENTIRE discussion up to know (this also goes also for
the other posters):

1a) objects -> good   
1b) not object -> not good
2) flyweight pattern will be a good string type.
3) A "+" for string concatenation is frowned upon in good OOP circles.
4) The Java string type is an immutable object.
5) C++ _possibly_ has some problems effiectly coding s[x] using class string
   types.

Which for practical relevance to the unicodestring type can be further
summarized to the empty set.

So in short: while I'm not entirely fond of an OOP approach to strings
(simply because I have never seen one that fits in with a language as
Delphi/FPC), I'm willing to hear the arguments.

But we are now several tens of posts in this subthread, and there has been
absolutely no information at all!

> Remember, Unicode support is much more that simply storing and
> displaying text

Displaying text is already pretty much out of the scope of the unicodestring
type that is the subject of this thread.




More information about the fpc-devel mailing list