[fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

Tony Whyman tony.whyman at mccallumwhyman.com
Thu Sep 5 10:24:55 CEST 2019


A few points:

1. IMHO: This is currently a Windows problem where the console buffer is 
UCS2. Linux (and probably all other cases its UTF8 - to be verified).

2. The following Microsoft blog post is interesting background on where 
MS are going with this:

https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/

3. The current Windows API includes "SetConsoleCP" which should (I 
haven't tested this) allow you to set transliteration to UTF-8 when you 
call the Windows ReadConsoleInput API function. This seems to imply that 
FTP can be a consistent UTF8 environment even when the Windows Console 
buffer is UCS2.

4. Because console input is buffered, you probably cannot have a 
situation where readln changes the console code page to fit the type 
(unicode or ansistring) of the variable that you are reading into.

5. You could change FTP so that under Windows, the console is always 
read using UCS2 with transliteration to ansistring happening when 
required and depending on the type of the variable that you are reading 
into. I think that is probably what you are asking for under Windows:

- The console code page is always UCS2.

- Console input is read into unicodestrings in native mode

- Console input is read into ansistrings with transliteration from UCS2 
after the input buffer has been parsed.

- Conversion to integers, floats, etc. occurs after transliteration to 
ansistring in order to avoid too many changes to the RTL.

- Under other OSs, Console input is UTF8 (or a supported ANSI code 
page). Transliteration to unicodestrings occurs after parsing the input 
buffer.

6. The question is: is it worth having a different approach to Windows 
when Windows allows you to set the console input buffer to UTF8 and 
hence have a common input environment for all OSs?

On 05/09/2019 08:00, LacaK wrote:
> Is there consensus/demand on such solution and any patch in this 
> direction will be accepted?
> If yes we must agree on implementation details and IMO also someone 
> must check what situation is in Delphi ... because I guess, that if 
> Delphi does not support this that also FPC will not diverge?
> Question1: should be supported "SetTextCodePage(CP_UTF16)" and 
> "SetTextCodePage(CP_UTF16BE)"?
> Question2: is this supported in Delphi?
> If answer to both questions is YES then I will fill bug report as 
> start point.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20190905/8950364f/attachment-0001.html>


More information about the fpc-pascal mailing list