[fpc-devel] Memory consumed by strings

Daniël Mantione daniel.mantione at freepascal.org
Sun Nov 23 13:31:15 CET 2008



Op Sun, 23 Nov 2008, schreef listmember:

> On 2008-11-23 14:10, Daniël Mantione wrote:
>
>> Therefore, any other encoding is a waste of memory and does not gain you
>> any speed. For that reason, I don't see the compiler switch from 8-bit
>> processing either.
>
> I nearly fully agree with you.
>
> Except that, when a string constant needs to contain non-ASCI chars. What do 
> we do in these cases?

The common approach is to do nothing, no processing needs to be done. I.e. 
the compiler justs passes on the bytes one by one from the source file to 
the object file.

For an IDE, this is a little bit more complicated. I.e. searching for a ç 
in a source file needs to find both the composed and the decomposed 
variant, and in the case of UTF-8, this character can be encoded in 1, 2, 
3 or 4 bytes which all need to be found. This is where UTF-16 and UTF-32 
start to make sense.

>> Only if you need to process characters (rather than pass them on),
>> UTF-32 is a lot faster and simpler.
>
> Yes. If I knew how to write this patch, I'd be working on it right now.

Unfortunately an UTF-32 string type is not on our roadmap either, so it 
would have to be an user contribution.

Daniël


More information about the fpc-devel mailing list