[fpc-devel] Memory consumed by strings

Sun Nov 23 13:34:29 CET 2008

On Sun, 23 Nov 2008 14:11:50 +0200
listmember <listmember at letterboxes.org> wrote:

>[...]
> > For very large projects, that should probably be done anyway at some
> > point. But even in that case, using a more memory-efficient string
> > type enables you to keep more data in memory and hence potentially
> > obtain better performance.
> 
> The last time I joined a relevant discussion, I was told worrying
> about native UCS-4 string-type would be pointless simply because that
> sort of thing is really needed for word processors only.
> 
> Now, I have been informed that Lazarus (and perhaps other IDEs) use 
> upwards of 50 MB string space just to do one of their basic
> operations.
> 
> That leaves me wondering how much do we lose performance-wise in 
> endlessly decompressing UTF-8 data, instead of using, say, UCS-4
> strings.

I'm wondering what you mean with 'endlessly decompressing UTF-8
data'.
You have to make a compromise between memory, ease of use and
compatibility. There is no solution without drawbacks.

If you want to process large 8bit text files then UTF-8 is better.
If you want to paint glyphs then normalized UTF-32 is better.
If you want some unicode with some mem overhead and some easy usage and
have compiler support for some compatibility then UTF-16 is better.

Mattias