[fpc-devel] String and UnicodeString and UTF8String

Wed Jan 12 07:45:47 CET 2011

In our previous episode, Jeff Wormsley said:
> > encoding of full Unicode. Ansi and UCS2 (really UTF-16) only *look* 
> > easier to handle in user code, but both will fail and require special 
> > code whenever characters outside the assumed codepage may occur.
> 
> Preface: I don't write international apps, and probably won't for the 
> foreseeable future...
> 
> Isn't all of this concentration on trying to make strings have single 
> byte characters (who cares how they are encoded), using the argument 
> that it is somehow faster, incorrect for just about any modern 
> processor, including embedded CPU's such as ARM?

>  It was my 
> understanding that 32 bit aligned access was always faster than byte 
> aligned access on just about any CPU FPC still supports.

1-byte access is always 1-byte aligned, and the memory system is still
slower than these kind of issues. And you shuffle a lot of zeroes extra
around.

But the trouble is also that 2-byte situation doesn't really solve anything,
(you still have surrogates and it never will be as simple as it was), and a
much bigger problem with legacy (how many two byte data do you get daily,
and how much 1 byte?)

> The argument holds just fine for memory, but I don't really get the 
> speed argument.  Maybe I'm missing something.

Shoveling twice as much memory around IS the speed argument :-)