[fpc-devel] Memory consumed by strings

Sun Nov 23 14:03:10 CET 2008

On Sun, 23 Nov 2008 13:49:32 +0100 (CET)
Daniël Mantione <daniel.mantione at freepascal.org> wrote:

> 
> 
> Op Sun, 23 Nov 2008, schreef Jonas Maebe:
> 
> >
> > On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
> >
> >> For an IDE, this is a little bit more complicated. I.e. searching
> >> for a ç in a source file needs to find both the composed and the
> >> decomposed variant, and in the case of UTF-8, this character can
> >> be encoded in 1, 2, 3 or 4 bytes which all need to be found. This
> >> is where UTF-16 and UTF-32 start to make sense.
> >
> > Characters can also be decomposed in UTF-16 and in UTF-32 (for the
> > same reasons as in UTF-8).
> 
> I am aware of that, but the combining cedille is not in the "easy to 
> process range" of UTF-8. In other words, you cannot do
> "if char[i]=combining_cedille" in UTF-8.
> 
> Instead UTF-8, you need to make sure the string has enough characters 
> left, and then compare multiple characters. Heck, you even need to
> take care of the fact the the combining cedille can be encoded in 2,
> 3 or 4 bytes.

Which means that there are three different unicode codes for this
character, which means a single if-equal does not work in UTF-16 or
UTF32 too.

if UTF8CharacterToUnicode(@s[i],CharLen) in
[cedille1,cedille2,cedille3] then

Mattias