[fpc-devel] Unit for handling UTF-8 strings
Michael Schnell
mschnell at lumino.de
Tue Apr 9 09:49:11 CEST 2013
On 04/09/2013 08:49 AM, Mattias Gaertner wrote:
> But how do you examine the characters?
Even defining what a character is, is extremely problematic with any use
of Unicode. Regarding that a "printable character" can be assembled by
multiple of the (nearly 2^32) Unicode "codes", and a single Unicode
codes is represented by 1, 2, 3, or 4 Bytes when using UTF-8 or UTF-16
encoding, and now the order of those bytes depends on the CPU-arch
and/or the file the string is imported from and the way it is imported.
This of course is not a problem introduced by fpc, but the perfectly
normal complexity of Unicode.
> If I understand Michael right, there will be some "implicit functions"
> for that. I wonder how they work.
This is what Delphi compatibility dictated. (You might read the Delphi
XE Docs on how to code Unicode enabled Delphi source.)
I do hope, fpc avoids some of the quirks Delphi introduces and offers
some useful additional features (e.g. dedicated string types such as
unencoded (raw, never auto-converted) Byte, Word and DWord Strings, and
a "flexible encoded" String type, that inherit the encoding scheme from
the source string when doing an assignment or using them as a function
parameter, doing auto-conversion whenever dynamically necessary.
-Michael
More information about the fpc-devel
mailing list