[fpc-pascal] FPC 2.0.2 vs 2.0.4 xmlread

Alexander Todorov alexx.todorov at gmail.com
Tue Oct 24 17:22:58 CEST 2006


Michael Van Canneyt wrote:
>
> XML is not in UTF-8.

What do you mean by that? I don't understand. If you delete the <?xml
... ?> line and save in a text file the "file" command says: UTF-8
Unicode text, if opened with web browser (e.g. Firefox) encoding is
also shown as UTF-8. Opening the file with mcview in hex mode I saw
latin characters are encoded with 1 byte and cyrillic with 2. I
manually checked the values of these bytes against a reference table.
This is also the result returned from (Ch and $1F) shl 6 + (Ch2 and
$3F) in function InternalGetChar. After assigning the value to
FCurChar ? appear.

        FCurChar := WideChar((Ch and $1F) shl 6 + (Ch2 and $3F));

>
> The XML unit is now XML 1.1 compliant. It performs conversion by itself,
> part of this change was indeed performed during the 2.0.4 release.

  { supported encodings }
  TEncoding = (enUnknown, enUTF8, enUTF16BE, enUTF16LE);

Does CP1251 work?

>
> My guess is that you simply don't need to do any conversion yourself.

I was doing UTF-8 to CP1251 conversion, but no the input I expected to
be UTF-8 doesn't look nice to me. :)


I am going to test tomorrow with different text encodings and see what
will happen.



More information about the fpc-pascal mailing list