[fpc-pascal] Re: Console Encoding in Windows (Local VS. UTF8)

Noah Silva shiruba at galapagossoftware.com
Mon Jul 29 08:28:44 CEST 2013


Hello everyone!

Success!  Writeln was indeed the main culprit.

The following works:
  WritelnUTF8(('あいうえお秋葉原'));

(Where:
1. WritelnUTF8 is a procedure that uses the Win32 API WriteConsole
directly, bypassing whatever writeln is doing.
2. Where the source code is saves as UTF8 with BOM
3. Where I don't set the console output page to UTF8)

But I'm outputting UTF8 to an ANSI console and it works?!
Preseumably, somewhere the RTL sees that my source code is UTF8 and decodes
to convert it to ANSI and that gets output and it works since the codepage
is ANSI.

A cleaner path would be to set the output codepage to ANSI and tell the
compiler not to touch my UTF8, but that doesn't seem to work.

My guess is that Writeln is somehow/somewhere trying to convert the already
converted text again, which would very certainly result in garbage!

Once I got the idea to use WriteConsole, I searched and found this:
http://forum.lazarus.freepascal.org/index.php?topic=17548.0

(svn: 37432)

thank you,
    Noah Silva


2013/7/9 Noah Silva <shiruba at galapagossoftware.com>

> Hi,
>
> I deal with in Japanese (and sometimes other languages) in a lot of my
> programs, and nothing I do seems to work consistently on Windows systems.
>  (OS X is no problem).
>
> I have followed steps in the Wiki, etc., but to little avail, so I have
> some questions for anyone who knows more than me:
> 1. What encoding "should" I be writing to the terminal?  from
> experimenting with text files using the cat command in powershell, it seems
> that local ("ANSI") encoding should be used.  This makes sense since older
> versions of windows only supported local encodings.
> 2. Is there any reason why writing out data in the local encoding (with
> write statements, etc.) should get corrupted?  For example is some level of
> the RTL assuming something about the encoding? (I don't think so, but...)
> 3. Is there a way to set the output to UTF8 so I can just write out UTF8
> and be done with it?
>
> Just to give an example:
> 1. I read in an SJIS CSV file, and display it on the screen, and it's
> corrupted.
> 2. I convert it to UTF8 before displaying it, and it's still corrupted.
> 3.I cat the file to the screen and it's ok.
> 4. I write the output to a file instead of the console, and it's ok.
>
> Something seems odd.
>
> Does anyone else have these issues?
>
> Thank you,
>     Noah Silva
>
> p.s.: I know that the actual data in my programs isn't broken, because if
> I write it to a file or database, there is no problem with corruption.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20130729/83eb45c2/attachment.html>


More information about the fpc-pascal mailing list