[fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

Tomas Hajny XHajT03 at hajny.biz
Thu Sep 5 12:06:04 CEST 2019


On 2019-09-05 09:00, LacaK wrote:
  .
  .
> Is there consensus/demand on such solution and any patch in this
> direction will be accepted?

I'm not aware of potential discussion about this so far, thus I cannot 
talk about any existing consensus (let's hear others), but I believe 
that such a consensus could be reached.


> If yes we must agree on implementation details and IMO also someone
> must check what situation is in Delphi ... because I guess, that if
> Delphi does not support this that also FPC will not diverge?

No, this is not necessarily the case. FPC certainly provides more 
functionality in various areas. As long as the parts supported in both 
Delphi and FPC are compatible, there should be no problem.


> Question1: should be supported "SetTextCodePage(CP_UTF16)" and
> "SetTextCodePage(CP_UTF16BE)"?

I don't know whether putting CP_UTF16 and CP_UTF16BE to the same level 
as 8-bit encodings is the right solution. I can imagine that it might be 
a completely new flag (e.g. CodepointSize) rather than relying on a 
background knowledge that CP_UTF16 and CP_UTF16BE are 2-bytes, CP_UTF32 
is 4-bytes and others are 1-byte encodings, because this knowledge would 
need to be hardcoded in quite a few places and it would be too easy to 
forget one.


> Question2: is this supported in Delphi?
> If answer to both questions is YES then I will fill bug report as start 
> point.

I have no idea about Delphi features (neither current nor future ones), 
that is up to someone else.


> As I wrote there is in sources explicit comment: "// all standard
> input is assumed to be ansi-encoded" which will be no more true if we
> will add UTF-16 support.

Yes - checking places where this assumption is used as well as providing 
an appopriate resolution need to be part of the potential contribution.


> I can imagine, that we can add check for TextRec(T).CodePage=CP_UTF16
> and CP_UTF16BE and these two situations handle specially (in read and
> also in write procedures of text files)

See above regarding using this flag or some other.


> But as far as Read[Ln]/Write[Ln] is core functionality I think, that
> somebody of core developers should look at it ... ;-)

Yes, that's for sure. There's at least one person from the core team 
list already involved. ;-) However, I'd be specifically interested in 
the opinion of Jonas (who provided great deal of the current Unicode 
support), Michael and Marco; I guess that others may not have so strong 
positions in this RTL part, but obviously any opinion needs to be 
considered.

Tomas


More information about the fpc-pascal mailing list