[fpc-devel] Unit for handling UTF-8 strings

Michael Schnell mschnell at lumino.de
Tue Apr 9 09:49:11 CEST 2013


On 04/09/2013 08:49 AM, Mattias Gaertner wrote:
> But how do you examine the characters? 

Even defining what a character is, is extremely problematic with any use 
of Unicode. Regarding that a "printable character" can be assembled by 
multiple of the (nearly 2^32) Unicode "codes", and a single Unicode 
codes is represented by 1, 2, 3, or 4 Bytes when using UTF-8 or UTF-16 
encoding, and now the order of those bytes depends on the CPU-arch 
and/or the file the string is imported from and the way it is imported.
This of course is not a problem introduced by fpc, but the perfectly 
normal complexity of Unicode.

> If I understand Michael right, there will be some "implicit functions" 
> for that. I wonder how they work. 

This is what Delphi compatibility dictated. (You might read the Delphi 
XE Docs on how to code Unicode enabled Delphi source.)

I do hope, fpc avoids some of the quirks Delphi introduces and offers 
some useful additional features (e.g. dedicated string types such as 
unencoded (raw, never auto-converted) Byte, Word and DWord Strings, and 
a "flexible encoded" String type, that inherit the encoding scheme from 
the source string when doing an assignment or using them as a function 
parameter, doing auto-conversion whenever dynamically necessary.

-Michael



More information about the fpc-devel mailing list