[fpc-devel] Unicode support (yet again)

Sun Sep 18 19:16:18 CEST 2011

In our previous episode, DaWorm said:
> But isn't it O(n^2) only when actually using unicode strings?
> Wouldn't you also be able to do something like String.Encoding := Ansi
> and then all String[i] accesses would then be o(n) + x (where x is the
> overhead of run time checking that it is safe to just use a memory
> offset, presumably fairly short)? Of course it would be up to the user
> to choose to reencode some string he got from the RTL or FCL that way
> and understand the consequences.

It is possible, but that state can't be in the string/object because for
read-only access strings are shared. (not doing so incurs a lot of copying
overhead)

So that means that you need to allocate that state locally, either
explicitely by manually allocating an iterator object (as Jonas already
explained) or implicitely on the stack. The latter requires a native string
type though, and is therefore hard with objects.

Implicit methods also have the disadvantage that the compiler must recognize
the access pattern. So usually that means only the simplest of cases (or
e.g. only when for..in is used)

> What assumptions are the typical String[i] user going to make about
> what is returned? 

IMHO development should not be driven by the users assumptions.

If so, we would now have UIs with one red button with the text "do what I
think", since that seems to be what most users want and expect :-)

> There will be the types that are seeing if the fifth character is a 'C' or
> something like that, and for those there probably isn't too much that is
> going to go wrong, they might have to switch to "C" instead, or the
> compiler can make the 'C' literal a "unicode char which is really a
> string" conversion at compile time. 

This is a very rare case. With the increasing internationalization of
applications operations on such literals are even rarer.