[fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
Tony Whyman
tony.whyman at mccallumwhyman.com
Thu Sep 5 10:28:44 CEST 2019
Apologies: when I typed "FTP" below I meant "FPC" :( I'm currently
drowning in acronym soup.
On 05/09/2019 09:24, Tony Whyman wrote:
>
> A few points:
>
> 1. IMHO: This is currently a Windows problem where the console buffer
> is UCS2. Linux (and probably all other cases its UTF8 - to be verified).
>
> 2. The following Microsoft blog post is interesting background on
> where MS are going with this:
>
> https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/
>
> 3. The current Windows API includes "SetConsoleCP" which should (I
> haven't tested this) allow you to set transliteration to UTF-8 when
> you call the Windows ReadConsoleInput API function. This seems to
> imply that FTP can be a consistent UTF8 environment even when the
> Windows Console buffer is UCS2.
>
> 4. Because console input is buffered, you probably cannot have a
> situation where readln changes the console code page to fit the type
> (unicode or ansistring) of the variable that you are reading into.
>
> 5. You could change FTP so that under Windows, the console is always
> read using UCS2 with transliteration to ansistring happening when
> required and depending on the type of the variable that you are
> reading into. I think that is probably what you are asking for under
> Windows:
>
> - The console code page is always UCS2.
>
> - Console input is read into unicodestrings in native mode
>
> - Console input is read into ansistrings with transliteration from
> UCS2 after the input buffer has been parsed.
>
> - Conversion to integers, floats, etc. occurs after transliteration to
> ansistring in order to avoid too many changes to the RTL.
>
> - Under other OSs, Console input is UTF8 (or a supported ANSI code
> page). Transliteration to unicodestrings occurs after parsing the
> input buffer.
>
> 6. The question is: is it worth having a different approach to Windows
> when Windows allows you to set the console input buffer to UTF8 and
> hence have a common input environment for all OSs?
>
> On 05/09/2019 08:00, LacaK wrote:
>> Is there consensus/demand on such solution and any patch in this
>> direction will be accepted?
>> If yes we must agree on implementation details and IMO also someone
>> must check what situation is in Delphi ... because I guess, that if
>> Delphi does not support this that also FPC will not diverge?
>> Question1: should be supported "SetTextCodePage(CP_UTF16)" and
>> "SetTextCodePage(CP_UTF16BE)"?
>> Question2: is this supported in Delphi?
>> If answer to both questions is YES then I will fill bug report as
>> start point.
>
> _______________________________________________
> fpc-pascal maillist - fpc-pascal at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20190905/a9dcb330/attachment.html>
More information about the fpc-pascal
mailing list