[fpc-devel] Memory consumed by strings
Daniël Mantione
daniel.mantione at freepascal.org
Sun Nov 23 13:10:28 CET 2008
Op Sun, 23 Nov 2008, schreef listmember:
> What I had in mind wasn't to store the string data in UTF-32 (or UCS-4); it
> would still be UTF-8 or whatever.
>
> I am only considering in memory representation being UTF-32 (or UCS-4).
>
> This way, loading from and saving to would hardly be affected, yet in-memory
> operations would be a lot faster and more simplified.
For source code, en extended ASCII charset like UTF-8 is the best choice,
since all characters that need processing are in the ASCII range, the code
needs to do nothing about the high ASCII codes except keeping them in one
part.
Therefore, any other encoding is a waste of memory and does not gain you
any speed. For that reason, I don't see the compiler switch from 8-bit
processing either.
The situation is very different when processing real text, the memory
saving advantages dissappear for the majority of the world, and if you
want to process characters beyond #127, UTF-16 and UTF-32 are much
easier. Obviously, UTF-32 is the best encoding if there are characters you
need to process are beyond #65535.
Only if you need to process characters (rather than pass them on), UTF-32
is a lot faster and simpler.
Daniël
More information about the fpc-devel
mailing list