[fpc-pascal] Console Encoding in Windows (Local VS. UTF8)

Noah Silva shiruba at galapagossoftware.com
Tue Jul 30 12:04:26 CEST 2013


Hi,

>
> >> If your source code is in UTF8 but you do not tell this to the compiler,
>
 ...

> > tried that some of the time.  I have never had any problems other than
> the
> > Windows Console issue.
>
> As mentioned before, that causes the compiler to directly pass your UTF-8
> data around with any conversion. That is how Lazarus works (it stores UTF8
> data in plain ansistrings) and as long as you only use LCL routines it will
> work fine, but using such code directly with the OS API (via the FPC RTL or
> not) will obviously cause problems if that API does not expect data in
> UTF-8 format.
>
>
I pretty much use only UTF8 in most programs, and make exclusive use of
UTF8String.  (I only use ANSIString if the string is known to be in a local
encoding, such as having been read from an SJIS file) - but I was under the
understanding that UTF8String is just an alias to ANSIString for now
anyway.  I convert if necessary when using the OS APIs. (The main problem
is knowing when it is necessary...)


> >>> Detecting the Unicode BOM or not seems to be a strange way to switch
> the
> >>> behavior of the compiler,
>
 >>>

> > to make the encoding of the file clear, but not so much as a mode switch
> > (which is what the Wiki makes it sound like).
>
> It causes the compiler to interpret the string constants in your program
> as UTF-8 rather than as unknown binary data, and hence convert them at run
> time to the current ansi code page when assigning them to an
> ansistring/shortstring. This is unrelated to mode switches.
>

Ooh, this clears up a lot that the Wiki didn't explain very well!

So basically saving the file with BOM tells the compiler/RTL to take care
of things for you, and saving as UTF8 without BOM is appropriate if you
will take care of any conversions yourself.

If UTF8String is just an alias to ANSIString, then I assume it also means
that right now the compiler would convert such constants to the local
encoding even when assigning to a UTF8String?  (If so, this explains why my
finally working console code works only with No BOM).

Also, I assume that the treatment of ResourceString and any other constants
is the same?

>> Those programs would be wrong. A user can easily change the console
> output>
>
 >>

> > not even have the fonts for and then complains that the program doesn't
> > work.  The answer will be "We only support Japanese systems."
>
> Those are business decisions, which are irrelevant to the FPC RTL. Our
> code is written as much as possible to work correctly under all
> circumstances.
>
> To be sure, that's an admirable goal!

>
> Jonas


Thanks for your comments,
    Noah Silva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20130730/8b593788/attachment.html>


More information about the fpc-pascal mailing list