[fpc-devel] Unicode support (yet again)

Graeme Geldenhuys graemeg.lists at gmail.com
Fri Sep 16 11:31:43 CEST 2011


On 16/09/2011 00:01, Dimitri Smits wrote:
> 
> errrm, utf-8 can have 6 octets representing one character,

Last time I checked, that was only in the very early stages of
developing the utf-8 specification. Since then, the maximums size of a
utf-8 code point is 4 bytes.

If you know otherwise, please post a URL. Here is the information I have:

"The original specification allowed for sequences of up to six bytes,
covering numbers up to 31 bits (the original limit of the Universal
Character Set). In November 2003 UTF-8 was restricted by RFC 3629 to
four bytes covering only the range U+0000 to U+10FFFF, in order to match
the constraints of the UTF-16 character encoding."

  http://en.wikipedia.org/wiki/UTF-8#History




> not forgetting those dioretics that are separate characters.

I'm representing a code point in TfpgChar.  If you want the "completed
character as is displayed on the screen", then simply normalize your
TfpgString first, then extract the "character".


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/




More information about the fpc-devel mailing list