[fpc-devel] Memory consumed by strings
listmember
listmember at letterboxes.org
Sun Nov 23 14:45:26 CET 2008
On 2008-11-23 14:34, Mattias Gaertner wrote:
> On Sun, 23 Nov 2008 14:11:50 +0200
> listmember<listmember at letterboxes.org> wrote:
>> That leaves me wondering how much do we lose performance-wise in
>> endlessly decompressing UTF-8 data, instead of using, say, UCS-4
>> strings.
>
> I'm wondering what you mean with 'endlessly decompressing UTF-8
> data'.
I am referring to going to the nth character in a string. With UTF-8 it
is no more a simple arithmetic and an index operation. You have to start
from zero and iterate until you get to your characters --at every step,
calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression.
> You have to make a compromise between memory, ease of use and
> compatibility. There is no solution without drawbacks.
>
> If you want to process large 8bit text files then UTF-8 is better.
> If you want to paint glyphs then normalized UTF-32 is better.
> If you want some unicode with some mem overhead and some easy usage and
> have compiler support for some compatibility then UTF-16 is better.
Do we have to think in terms of encodings (which are, ways of
compressing text) when what we actually mean 1-byte, 2-byte and 4-byte
per char strings.
More information about the fpc-devel
mailing list