[fpc-devel] Unicodestring branch, please test and help fixing

Mattias Gaertner nc-gaertnma at netcologne.de
Thu Sep 11 23:54:27 CEST 2008


On Thu, 11 Sep 2008 22:56:49 +0200
Martin Schreiber <fpmse at bluewin.ch> wrote:

>[...]
> > Doesn't that mean we will be --by design-- unable to write something
> > like 'Yom Kippur (יוֹם כִּפּוּר)' on a caption?

Yes and more. See below.


> > This is why I keep asking that the 'TCharacter' or 'TChar' needs to
> > have a language attribute.
> >
> MSEgui has a richstringty type, a combination of a widestring and a
> dynamic array of formatting info. There are formatting infos for the
> changes only, a richstringty without formatting info has a nil
> pointer for the dynamic array. See lib/common/kernel/mserichstring.pas
> http://sourceforge.net/projects/mseide-msegui/
> 
> "
> type
>  newinfoty = (ni_bold=ord(fs_bold),ni_italic=ord(fs_italic),
>               ni_underline=ord(fs_underline),ni_strikeout=ord(fs_strikeout),
>               ni_selected=ord(fs_selected),
>               //same order as in fontstylety
>                  ni_fontcolor,ni_colorbackground,ni_delete);
>  newinfosty = set of newinfoty;
> 
> const
>  fonthandleflags = [ni_bold,ni_italic];
>  fontstyleflags =
> [ni_bold,ni_italic,ni_underline,ni_strikeout,ni_selected];
> 
> type
>  charstylety = record
>   fontcolor,colorbackground: pcolorty;
>   fontstyle: fontstylesty;
>  end;
>  pcharstylety = ^charstylety;
> 
>  charstylearty = array of charstylety;
> 
>  formatinfoty = record
>   index: integer;            //0-> from first char
>   newinfos: newinfosty;
>   style: charstylety;
>  end;
> 
>  pformatinfoty = ^formatinfoty;
>  formatinfoarty = array of formatinfoty;
>  pformatinfoarty = ^formatinfoarty;
> 
>  richstringty = record
>   text: msestring;
>   format: formatinfoarty;
>  end;
> "
> 
> It was designed for fast processing in MSEide source code editor.

It is fast, but it misses some Unicode features, like compound
characters.
For example: Mac OS X file system uses compound characters for german
umlaute. MSEide shows the o umlaut as o followed by a box.
Lazarus SynEdit under gtk2 shows it correct, because it uses pango,
which has an almost complete Unicode implementation. But editing is
wrong in SynEdit, because it does not handle compound characters
yet. Gladfully typing an o-umlaut creates a 'normal' single character in
SynEdit. The native gtk2 widgets like TButton and TEdit
handle compound characters correctly.

I wonder how a TCharacter will be defined that supports all Unicode
features. Probably it will be a monster, that only few text editors
want to use.

Mattias



More information about the fpc-devel mailing list