[fpc-devel] Re: new 27 page document describing Unicode support in D2009

Graeme Geldenhuys graemeg.lists at gmail.com
Fri Nov 21 22:25:30 CET 2008


On Fri, Nov 21, 2008 at 11:08 PM, Graeme Geldenhuys
<graemeg.lists at gmail.com> wrote:
>
> I thought you guys might find this interesting. It's a new 27 page
> document describing Unicode support in D2009.
>
> http://dn.codegear.com/article/38980

Seeing that I don't own D2009 and only read about it's Unicode support
I found some of the information interesting - and it was things we
argued about in this mailing list.

For example:

1...
  Length() returns the bytes for UTF8String
  but Length() returns the elements (what we know as characters) for
String or UTF16 strings.
  Length() also returns bytes for AnsiString.

--------------------
        var
          str8: Utf8String;
          str16: string;
        begin
          str8 := 'Cantù';
          Memo1.Lines.Add ('UTF-8');
          Memo1.Lines.Add('Length: ' + IntToStr (Length (str8)));
          Memo1.Lines.Add('5: ' + IntToStr (Ord (str8[5])));
          Memo1.Lines.Add('6: ' + IntToStr (Ord (str8[6])));
          str16 := str8;
          Memo1.Lines.Add ('UTF-16');
          Memo1.Lines.Add('Length: ' + IntToStr (Length (str16)));
          Memo1.Lines.Add('5: ' + IntToStr (Ord (str16[5])));
As you might expect, the str8 string has a length of 6 (meaning 6
bytes), while the str16
string has a length of 5 (meaning 10 bytes, though). Notice that
Length invariably returns the
number of string elements, which in case of variable-length
representations don't match the
number of Unicode code points represented by the string. This is the
output of the program:
        UTF-8
        Length: 6
        5: 195
        6: 185
        UTF-16
        Length: 5
        5: 249

--------------------

2...   TStrings can now take an encoding parameter to specify how it
should load or save files.

-----------------------------
STREAMING TSTRINGS
The ReadFromFile and WriteToFile methods of the TStrings class can be
called with
an encoding. If you write a string list to text file without providing
a specific encoding, the class
will use TEncoding.Default, which uses the internal DefaultEncoding in turn
extracted at the first occurrence by the current Windows code page. In
other words, if you save
a file you'll get the same ANSI file as before.
Of course, you can also easily force the file to a different format,
for example the UTF-16 format:

            Memo1.Lines.SaveToFile('test.txt',  TEncoding.Unicode);
-----------------------------


anyway, there are a lot more interesting facts in this document. Well
worth reading to get a better understanding of unicode.


Regards,
  - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/


More information about the fpc-devel mailing list