[fpc-devel] Memory consumed by strings

Daniël Mantione daniel.mantione at freepascal.org
Sun Nov 23 13:49:32 CET 2008



Op Sun, 23 Nov 2008, schreef Jonas Maebe:

>
> On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
>
>> For an IDE, this is a little bit more complicated. I.e. searching for a ç 
>> in a source file needs to find both the composed and the decomposed 
>> variant, and in the case of UTF-8, this character can be encoded in 1, 2, 3 
>> or 4 bytes which all need to be found. This is where UTF-16 and UTF-32 
>> start to make sense.
>
> Characters can also be decomposed in UTF-16 and in UTF-32 (for the same 
> reasons as in UTF-8).

I am aware of that, but the combining cedille is not in the "easy to 
process range" of UTF-8. In other words, you cannot do
"if char[i]=combining_cedille" in UTF-8.

Instead UTF-8, you need to make sure the string has enough characters 
left, and then compare multiple characters. Heck, you even need to take 
care of the fact the the combining cedille can be encoded in 2, 3 or 4 
bytes.

Daniël


More information about the fpc-devel mailing list