[fpc-pascal] Console Encoding in Windows (Local VS. UTF8)
Jonas Maebe
jonas.maebe at elis.ugent.be
Tue Jul 30 09:56:06 CEST 2013
On 30 Jul 2013, at 04:17, Noah Silva wrote:
> 2013/7/29 Jonas Maebe <jonas.maebe at elis.ugent.be>
>
>> If your source code is in UTF8 but you do not tell this to the compiler,
>> it will directly pass the "garbage" in your source code to the Win32 APIs.
>> After all, there is no way for it to know what else it should with it, and
>> instead has to assume that the programmer knows what he is doing.
>
> Yes, however in several places the Wikis recommend UTF8 w/o BOM, so I had
> tried that some of the time. I have never had any problems other than the
> Windows Console issue.
As mentioned before, that causes the compiler to directly pass your UTF-8 data around with any conversion. That is how Lazarus works (it stores UTF8 data in plain ansistrings) and as long as you only use LCL routines it will work fine, but using such code directly with the OS API (via the FPC RTL or not) will obviously cause problems if that API does not expect data in UTF-8 format.
>>> Detecting the Unicode BOM or not seems to be a strange way to switch the
>>> behavior of the compiler,
>>
>> Why? A BOM unequivocally states the encoding of a file.
>>
> Which is why I would normally think that one should be used at all times
> to make the encoding of the file clear, but not so much as a mode switch
> (which is what the Wiki makes it sound like).
It causes the compiler to interpret the string constants in your program as UTF-8 rather than as unknown binary data, and hence convert them at run time to the current ansi code page when assigning them to an ansistring/shortstring. This is unrelated to mode switches.
>> Those programs would be wrong. A user can easily change the console output
>> page with the "chcp" command.
>>
> Perhaps theoretically, but for example, a Japanese vendor ships their
> program and it works on every Japanese Windows system. One user changes
> their code page to Greek or some other code page that their system might
> not even have the fonts for and then complains that the program doesn't
> work. The answer will be "We only support Japanese systems."
Those are business decisions, which are irrelevant to the FPC RTL. Our code is written as much as possible to work correctly under all circumstances.
Jonas
More information about the fpc-pascal
mailing list