[fpc-devel] Unicode support (yet again)

Dimitri Smits smitco at telenet.be
Fri Sep 16 12:22:47 CEST 2011


----- "Graeme Geldenhuys" <graemeg.lists at gmail.com> schreef:

> On 16/09/2011 00:01, Dimitri Smits wrote:
> > 
> > errrm, utf-8 can have 6 octets representing one character,
> 
> Last time I checked, that was only in the very early stages of
> developing the utf-8 specification. Since then, the maximums size of
> a
> utf-8 code point is 4 bytes.
> 
> If you know otherwise, please post a URL. Here is the information I
> have:
> 
> "The original specification allowed for sequences of up to six bytes,
> covering numbers up to 31 bits (the original limit of the Universal
> Character Set). In November 2003 UTF-8 was restricted by RFC 3629 to
> four bytes covering only the range U+0000 to U+10FFFF, in order to
> match
> the constraints of the UTF-16 character encoding."
> 
>   http://en.wikipedia.org/wiki/UTF-8#History
> 

good to know.
I've learned about unicode/utf8 from the following links
http://www.joelonsoftware.com/articles/Unicode.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html

never bothered to look into the rfc's and/or official unicode site(s).

when I follow the link to the rfc you mentioned in the second link above, I indeed see that it is 4 octets according to the rfc. However, when I follow the link to the unicode appendix (http://www.cl.cam.ac.uk/~mgk25/ucs/ISO-10646-UTF-8.html), mentioned in that second page (anchored link: http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8), I see that according to the iso spec, it still is (was?) 6.

kind regards,
Dimitri Smits



More information about the fpc-devel mailing list