[fpc-devel] Memory consumed by strings

Daniël Mantione daniel.mantione at freepascal.org
Sun Nov 23 13:10:28 CET 2008



Op Sun, 23 Nov 2008, schreef listmember:

> What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it 
> would still be UTF-8 or whatever.
>
> I am only considering in memory representation being UTF-32 (or UCS-4).
>
> This way, loading from and saving to would hardly be affected, yet in-memory 
> operations would be a lot faster and more simplified.

For source code, en extended ASCII charset like UTF-8 is the best choice, 
since all characters that need processing are in the ASCII range, the code 
needs to do nothing about the high ASCII codes except keeping them in one 
part.

Therefore, any other encoding is a waste of memory and does not gain you 
any speed. For that reason, I don't see the compiler switch from 8-bit 
processing either.

The situation is very different when processing real text, the memory 
saving advantages dissappear for the majority of the world, and if you 
want to process characters beyond #127, UTF-16 and UTF-32 are much 
easier. Obviously, UTF-32 is the best encoding if there are characters you 
need to process are beyond #65535.

Only if you need to process characters (rather than pass them on), UTF-32 
is a lot faster and simpler.

Daniël


More information about the fpc-devel mailing list