[fpc-devel] Memory consumed by strings
Daniël Mantione
daniel.mantione at freepascal.org
Sun Nov 23 13:49:32 CET 2008
Op Sun, 23 Nov 2008, schreef Jonas Maebe:
>
> On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
>
>> For an IDE, this is a little bit more complicated. I.e. searching for a ç
>> in a source file needs to find both the composed and the decomposed
>> variant, and in the case of UTF-8, this character can be encoded in 1, 2, 3
>> or 4 bytes which all need to be found. This is where UTF-16 and UTF-32
>> start to make sense.
>
> Characters can also be decomposed in UTF-16 and in UTF-32 (for the same
> reasons as in UTF-8).
I am aware of that, but the combining cedille is not in the "easy to
process range" of UTF-8. In other words, you cannot do
"if char[i]=combining_cedille" in UTF-8.
Instead UTF-8, you need to make sure the string has enough characters
left, and then compare multiple characters. Heck, you even need to take
care of the fact the the combining cedille can be encoded in 2, 3 or 4
bytes.
Daniël
More information about the fpc-devel
mailing list