[fpc-pascal] Console Encoding in Windows (Local VS. UTF8)

Noah Silva shiruba at galapagossoftware.com
Mon Jul 29 04:21:04 CEST 2013


Hi,

This answer is a bit late, but...


2013/7/9 Dennis Poon <dennis at avidsoft.com.hk>

>
> I have followed steps in the Wiki, etc., but to little avail, so I have
>> some questions for anyone who knows more than me:
>> ...
>
> ...
>
> 4. I write the output to a file instead of the console, and it's ok.
>>
>
> Please state the windows version you are using. XP or Windows 7?


Windows 7, but there isn't any difference in this case.


>  I deal with chinese in my programs so I know your problems. The same
> delphi 5 program works differently on XP and Windows 7.  Looks like Windows
> 7 has removed support for non unicode (I am not sure whether the Unicode it
> uses is UTF8, UTF16 or UTF32).
>
> When you say "removed support", what specifically are you talking about ?
 They both use the NT kernel and so use UTF16 most everywhere internally,
but the assume that most programs use local encodings.


> Seems that all filenames in XP are treated as unicode code. If you type a
> non unicode file name in Explorer, it will be auto converted to unicode.
>

Filenames as accessed from where?  Filenames in NTFS are always stored in
Unicode, so far as I know.  When you enter a file name in Explorer, how do
you know if it is being entered in Unicode or not?  Basically, explorer is
probably a Unicode-aware program (so far as I know), so I would assume any
text you enter starts life as Unicode via the IME.


> Also, in XP, when text is copied to MS Office from other programs and vice
> versa, XP seems to do an automatic conversion to UTF-8 and vice versa.
>

MS Office is also of course Unicode Aware, and the newer XML file formats
they use save everything as as Unicode to the best of my knowledge.  When
importing/exporting text files, clipboard, etc., they will use the local
encoding for most things.

That is if you copy some text in your program which is encoding in SJIS and
> paste it to Word, the text seems to be auto converted to unicode.
>
> Seems reasonable.


> As for file handling, it seems some programs will read the first 2 bytes
> of the text file to determine the the encoding of the file. Google about it.
>

This is well known, though many programs use much more than 2 bytes in
order to guess if there is no BOM.  Though Microsoft's guessing algorithm
is rather entertaining at times.


> Sorry, I don't have exact answers to your questions. Just share some of my
> experience.
>
>
Well certainly the more the merrier, other people who search this list may
benefit from your comments even if they aren't relevant to my case.

My problem is this:
I want to output text to the console and have it not be corrupted.  I have
the text in my program as UTF8, but when I write it to the console, it
shows corrupted garbage.  This happens whether I write it out as UTF8 or
ANSI (SJIS).  Since the console basically works with local encoding, it
should work when I use UTF8ToAnsi to convert the text before writing it,
but it doesn't.

Since it works with other programs and things like the "type" command, it
occurred to be that it might be a FPC issue, or that I might be interfering
with some built-in auto-conversion feature (which is why I tried to output
as UTF8 as well).

I'll figure it out eventually, though.

Dennis
>
>
Thank you,
    Noah Silva

> ______________________________**_________________
> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.**org<fpc-pascal at lists.freepascal.org>
> http://lists.freepascal.org/**mailman/listinfo/fpc-pascal<http://lists.freepascal.org/mailman/listinfo/fpc-pascal>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20130729/dd55a83e/attachment.html>


More information about the fpc-pascal mailing list