[fpc-pascal] Console Encoding in Windows (Local VS. UTF8)

Noah Silva shiruba at galapagossoftware.com
Tue Jul 30 04:17:25 CEST 2013


Hi,

2013/7/29 Jonas Maebe <jonas.maebe at elis.ugent.be>

>
> On 29 Jul 2013, at 08:01, Noah Silva wrote:
>
> > Oddly enough, I can get my (2.6.1) version to change the garbage it shows
> > but but saving the file as UTF8 with BOM.  (Until now, I had been using
> > UTF8 w/o BOM).
>
> If your source code is in UTF8 but you do not tell this to the compiler,
> it will directly pass the "garbage" in your source code to the Win32 APIs.
> After all, there is no way for it to know what else it should with it, and
> instead has to assume that the programmer knows what he is doing.


Yes, however in several places the Wikis recommend UTF8 w/o BOM, so I had
tried that some of the time.  I have never had any problems other than the
Windows Console issue.

> Detecting the Unicode BOM or not seems to be a strange way to switch the
> > behavior of the compiler,
>
> Why? A BOM unequivocally states the encoding of a file.
>
> Which is why I would normally think that one should be used at all times
to make the encoding of the file clear, but not so much as a mode switch
(which is what the Wiki makes it sound like).

In most other situations, adding/removing the BOM doesn't make much of a
difference.  Adding it may cause compatibility issues if programs don't
expect it, and it may help if they didn't properly autodetect the encoding
anyway.


> > I wonder why a $define wasn't used instead.
>
> A directive is also available:
> http://www.freepascal.org/docs-html/prog/progsu88.html
>
> And a command line parameter also (-Fc).


This makes sense to me.


> >> The ansi code page is (or at least can be) different, that's the the
> >> result of
> >>  function GetACP:UINT; stdcall; external 'kernel32' name 'GetACP';
> >>
> > This is also 932, as expected.
> > (I assume they should always be the same though, I don't think most
> > programs check the console output page before writing).
>
> Those programs would be wrong. A user can easily change the console output
> page with the "chcp" command.
>
> Perhaps theoretically, but for example, a Japanese vendor ships their
program and it works on every Japanese Windows system.  One user changes
their code page to Greek or some other code page that their system might
not even have the fonts for and then complains that the program doesn't
work.  The answer will be "We only support Japanese systems."  This makes
sense for local encodings, since you can only usually support one, and
things can't be cleanly cross-coded anyway.  (f.e. if I changed my console
to Korean with chcp and my program tried to properly convert the code page
, it wouldn't work anyway, because the characters used in Japanese don't
exist in the destination character set).  With the possibility for Local or
Unicode character sets of course it makes sense to check which one is in
use and output to that one - I just doubt that most programs actually do.

Thank you,
     Noah Silva


> Jonas_______________________________________________
> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20130730/4138ef8a/attachment.html>


More information about the fpc-pascal mailing list