[fpc-devel] Unicode RTL

Wed Nov 16 23:19:45 CET 2005


> -----Original Message-----
> From: fpc-devel-bounces at lists.freepascal.org
> [mailto:fpc-devel-bounces at lists.freepascal.org]On Behalf Of Daniël
> Mantione
> Sent: 16 November 2005 21:58
> To: FPC developers' list
> Subject: RE: [fpc-devel] Unicode RTL
>
>
> Op Wed, 16 Nov 2005, schreef peter green:
>
> >
> > > pos('ë','Daniël');
> > >
> > > ... has a different implementation for utf-8 and 8-bit code pages.
> > one little desgin feature of utf-8 is that is was carefully
> designed to be
> > friendly to byte-orientated code. No special precautions are needed for
> > substring matching in utf-8!
>
> Which is the "be ignorant about multibyte character sets" model. Nothing
> wrong with that model, but it has its limitations.
UTF-8 however is far more friendly to that model than most legacy multibyte
character sets. Most importantly you CAN'T get a false match when doing
byte-orientated substring matching on utf-8 strings and if some code does
chop a UTF-8 string mid-character only the chopped character will be lost.