[fpc-devel] Unicode and UTF8String

Marco van de Voort marcov at stack.nl
Mon Dec 1 14:57:47 CET 2008


In our previous episode, Martin Friebe said:
> >   
> I agree, using RTlString will probably help fpc to optimize your exe for 
> each OS.
> 
> But, using RTLString means you do not know, if you have UTF8 or not. 

Correct.

> Because UTF8 behaves slightly different from other Strings, many 
> operations can not be performed on RTLString
> 
> foo[1], copy, pos ... simply because you do not know, if the result is a 
> char, a codepoint or a subcodepoint (single utf8 byte)

You don't know that about UTF-16 either. Even though that is no problem in
most cases, it is slowly time to abandon too simplistic thinking about
strings. The best solution is to minimize editing, and localize them in
certain parts of the code, keeping most of the code encoding agnostic.

And yes, if you lazy, you lose performance due to automatic conversions. It
has always been that way (also when mixing short and ansistring)

This is not just a good thing for OS interfacing code, but a good thing in
general.




More information about the fpc-devel mailing list