[fpc-devel] Unicode support (yet again)

Flávio Etrusco flavio.etrusco at gmail.com
Mon Sep 19 04:34:21 CEST 2011


On Sun, Sep 18, 2011 at 11:45 AM, Jonas Maebe <jonas.maebe at elis.ugent.be> wrote:
>
> On 18 Sep 2011, at 13:57, Flávio Etrusco wrote:
>
>> One obvious way to mitigate this would be to store the last
>> CodePoint->Char in the string record, so that at least the most common
>> case is covered.
>
> ... and so that the common case is broken in multithreaded environments.
>
> Directly indexing a string will most likely always work using fixed-length steps (8, 16, 32 bit).
> If you want to iterate based on anything else (such as code points), use some kind of
> iterator model instead.
>
> Jonas

By "the most common case" I meant non-threaded ;-) But no, I don't see
any trivial and efficient solution to avoid the worst case (but among
threadvars, per-string fixed lookup table, shared lookup caches,
per-reference data (like Object), etc, there must be a good solution).
Basically I think the UnicodeString should move farther (than
AnsiString) away from PChar, from the compiler/RTL POV.
I think that the user should (have to) use the iterator model to
*efficiently* iterate over the string, but I see indexed access as a
compatibility feature, and as such should care more about correctness
and ease-of-use rather than performance. I thought the endless bugs
WRT to char vs codepoint indexes, even in Java-developed software,
would buy my argument...

-Flávio



More information about the fpc-devel mailing list