[fpc-pascal] Console Encoding in Windows (Local VS. UTF8)

Noah Silva shiruba at galapagossoftware.com
Tue Jul 30 04:29:54 CEST 2013


Hi,

2013/7/29 Michael Schnell <mschnell at lumino.de>

>  On 07/29/2013 07:36 AM, Noah Silva wrote:
>
>
>>  Using UTF16 for internal string handling is a sensible option.
>
> It depends.
> UTF-16 needs more memory used
>

No, UTF16 only needs more memory if most of the text is ASCII.  It actually
uses less than UTF8 in the average case for Japanese, for example.

Linux OS API in most cases is 8 Bit, while Windows OS API is 16 bit
>

I assume by 8bit, you mean variable byte encoding like UTF8.


> Conversions are very expensive.
>

This is not as bad as some people make it out to be.  You have to be
converting a *lot* of data for it to be noticeable.

If you need to import export much data but don't do much calculating of
> course using the the import/export format all over the place is sensible.
> If you do many calculations, the type of calculation might suggest a
> certain encoding.
>
> And if you don't do either (which most programs don't with string data),
then either format is just fine.

> To address your specific points:
> 1.Lazarus User API already supports UTF8 so far as I know.
>
> I suppose this is bound to change once fpc has completed the move to "new
> Delphi Strings".
>

I really don't think so, the reasons are even well detailed in the Wiki.

>   2. TStringList could easily support both, but as long as the conversion
> to/from other code pages (especially UTF8) is automatic, I wouldn't mind.
>
> I already delved into this in another thread here: I do believe that it is
> easily possible to invent a string type that supports any encoding and that
> can be used to create such a flexible TStringList, but this needs
> additional compiler support in an way that is not anticipated by Delphi.
> IMHO this is possible without risking noticeable performance degradation in
> any of the thinkable application variants.
>
> From what I understand, the plan is for strings to store their codepage as
an attribute internally along with their length, and since the
compiler/runtime library will know their codepage, it can convert as
necessary.  Either way, you can make your own StringList variants for each
type easily enough.

For example, I normally use UTF8 for everything, but I have one linguistic
analysis program I wrote that does heavy duty analysis of the string, so
that stores everything in memory as UTF16.  I use StringList and similar
without any problems.  (I don't use UTF8 and UTF16 in the same structures
though...)

>   3. Not sure what class inheritance has to do with this...
>
> If you do TSrtingList (in fact TStrings) that uses this new type in the
> user-programmer interface it needs to be possible to derive classes from
> those that use the fully Delphi compatible String types with predefined
> encoding. The compiler magic needs to be done appropriately to handle this
> cases, requesting automatic conversions (only) when necessary.
>
> In fact, I am fine with manual conversions, so long as 99% of everything
"just works" with UTF8 and/or UTF16.  Then you would only need to
occasionally worry about local encoding for legacy import/export use.  It's
easy enough just to make an overload for Windows API calls (ick) that can
accept UTF8 or vice versa for UTF8 native calls you want to use with UTF16
if you really need to. The real issue to be is that until now, the compiler
doesn't actually *know* that a string is UTF8 or SJIS, which means it
doesn't give you an error when things aren't right, they just get garbled,
and the programmer gets left with a mystery to sort out.


> -Michael
>
> -- Noah

> _______________________________________________
> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20130730/797d5c36/attachment.html>


More information about the fpc-pascal mailing list