[fpc-devel] Unicode and UTF8String

Mattias Gaertner nc-gaertnma at netcologne.de
Mon Dec 1 21:03:13 CET 2008

On Mon, 01 Dec 2008 20:40:14 +0100
Florian Klaempfl <florian at freepascal.org> wrote:

> Mattias Gaertner schrieb:
> > On Mon, 01 Dec 2008 16:36:23 +0100
> > Florian Klaempfl <florian at freepascal.org> wrote:
> >> [...] Martin Friebe schrieb:
> >>> I can not see how I can interpret RtlString[1]. If the result is
> >>> bigger than 128, then I must know what type it is. If it is ANSI,
> >>> it is a single byte char. If it is utf8, it is a sub-codepoint
> >>> which will be part of a codepoint.
> >>> If it is widestring, well yes, here breaks my assumption that
> >>> RtlString[1] returns a byte.... ouch
> >> I see this as a theoretic consideration. Please give a real world
> >> (!) code example when this causes a problem.
> > Can you give a real world example where a different RTLString for
> > each platform solves a problem?
> It solves for example the problem that there are platforms where no
> unicode support is available or desired 


> and it avoids unneeded conversions. 

I understand it 'avoids unneeded conversions' *inside* the RTL, by
adding implicit conversions to the code accessing the RTL.

> I'd be fine using utf-16 on all platforms :)

Me2. At least for the file functions.
I have some doubt about the classes.pp.

> >> If you assign the result of an rtl function to an rtlstring, this
> >> means you don't care about the type of rtlstring[1] or the
> >> knowledge, that it's type is rtlchar is enough for you. If you
> >> assign it to an ansistring/widestring whatever, you know what you
> >> get.
> > What string type will be TStrings.Items and the many other strings
> > in the classes.pp?
> Not yet decided though I'd make them RTLString as well.

TStrings is dog slow and the only reason, why it was still reasonable
was assigning strings was only reference counting.
If TStrings uses a platform dependent string, this is a big
performance problem.


