[fpc-devel] Memory consumed by strings

Graeme Geldenhuys graemeg.lists at gmail.com
Sun Nov 23 18:31:24 CET 2008


On Sun, Nov 23, 2008 at 3:45 PM, listmember <listmember at letterboxes.org> wrote:
>
> I am referring to going to the nth character in a string. With UTF-8 it is
> no more a simple arithmetic and an index operation. You have to start from
> zero and iterate until you get to your characters --at every step,
> calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression.

Well if the string is well formed UTF-8, the first byte of each
character will tell you how far to jump ahead, so you don't need to
visit each byte.

With UTF-16, you also can't just jump to the n'th character. It also
needs special attention to check for surrogate pairs.

At least the good thing of UTF-8 is that you don't have to worry about
LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue.


Regards,
  - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/



More information about the fpc-devel mailing list