[fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
Tomas Hajny
XHajT03 at hajny.biz
Thu Sep 5 12:06:04 CEST 2019
On 2019-09-05 09:00, LacaK wrote:
.
.
> Is there consensus/demand on such solution and any patch in this
> direction will be accepted?
I'm not aware of potential discussion about this so far, thus I cannot
talk about any existing consensus (let's hear others), but I believe
that such a consensus could be reached.
> If yes we must agree on implementation details and IMO also someone
> must check what situation is in Delphi ... because I guess, that if
> Delphi does not support this that also FPC will not diverge?
No, this is not necessarily the case. FPC certainly provides more
functionality in various areas. As long as the parts supported in both
Delphi and FPC are compatible, there should be no problem.
> Question1: should be supported "SetTextCodePage(CP_UTF16)" and
> "SetTextCodePage(CP_UTF16BE)"?
I don't know whether putting CP_UTF16 and CP_UTF16BE to the same level
as 8-bit encodings is the right solution. I can imagine that it might be
a completely new flag (e.g. CodepointSize) rather than relying on a
background knowledge that CP_UTF16 and CP_UTF16BE are 2-bytes, CP_UTF32
is 4-bytes and others are 1-byte encodings, because this knowledge would
need to be hardcoded in quite a few places and it would be too easy to
forget one.
> Question2: is this supported in Delphi?
> If answer to both questions is YES then I will fill bug report as start
> point.
I have no idea about Delphi features (neither current nor future ones),
that is up to someone else.
> As I wrote there is in sources explicit comment: "// all standard
> input is assumed to be ansi-encoded" which will be no more true if we
> will add UTF-16 support.
Yes - checking places where this assumption is used as well as providing
an appopriate resolution need to be part of the potential contribution.
> I can imagine, that we can add check for TextRec(T).CodePage=CP_UTF16
> and CP_UTF16BE and these two situations handle specially (in read and
> also in write procedures of text files)
See above regarding using this flag or some other.
> But as far as Read[Ln]/Write[Ln] is core functionality I think, that
> somebody of core developers should look at it ... ;-)
Yes, that's for sure. There's at least one person from the core team
list already involved. ;-) However, I'd be specifically interested in
the opinion of Jonas (who provided great deal of the current Unicode
support), Michael and Marco; I guess that others may not have so strong
positions in this RTL part, but obviously any opinion needs to be
considered.
Tomas
More information about the fpc-pascal
mailing list