[fpc-devel] Memory consumed by strings
listmember at letterboxes.org
Sun Nov 23 15:02:16 CET 2008
On 2008-11-23 14:49, Daniël Mantione wrote:
> Op Sun, 23 Nov 2008, schreef Jonas Maebe:
>> On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
>>> For an IDE, this is a little bit more complicated. I.e. searching for
>>> a ç in a source file needs to find both the composed and the
>>> decomposed variant, and in the case of UTF-8, this character can be
>>> encoded in 1, 2, 3 or 4 bytes which all need to be found. This is
>>> where UTF-16 and UTF-32 start to make sense.
>> Characters can also be decomposed in UTF-16 and in UTF-32 (for the
>> same reasons as in UTF-8).
> I am aware of that, but the combining cedille is not in the "easy to
> process range" of UTF-8. In other words, you cannot do
> "if char[i]=combining_cedille" in UTF-8.
> Instead UTF-8, you need to make sure the string has enough characters
> left, and then compare multiple characters. Heck, you even need to take
> care of the fact the the combining cedille can be encoded in 2, 3 or 4
This is one of the million and one small details that one has to keep in
mind while programming.
What I think would more sensible is that, instead of using all these
variable sizes and all, simply use 4-byte/char strings and compose (in
UTF sense) everything into that string.
You do this once, when importing/loading text to your app. And, then on,
everthing is just like the good old string --except that it is a 4-byte
per char string, instead of 1-byte.
Now, my question is this: How would I create a 'FourByteString' type,
reference counted etc. just like the usual 'String'?
How hard is it?
Can someone like me, who does nor speak assembler, do it?
If so, where do I begin copy&pasting from 'string'?
More information about the fpc-devel