[fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
LacaK
lacak at zoznam.sk
Wed Sep 4 13:39:24 CEST 2019
> You may be able to improve on this using system.BlockRead.
Probably yes, but then I must read in local buffer and examine buffer
for CR/LF.
And return from my function UCS2ReadLn() only portion of string up to
CR/LF and rest of string return on next call to my function.
(so I must keep unprocessed part in global buffer)
>
> Also, you are assuming low order byte first which may not be portable.
Yes, In my case LE is sufficient as far as I check presence of BOM $FF$FE
L.
>
> On 04/09/2019 11:14, LacaK wrote:
>> Nice! Thank you very much.
>>
>> As an alternative for F:TextFile I am using:
>>
>> procedure UCS2ReadLn(var F: TextFile; out s: String);
>> var
>> c: record
>> case boolean of
>> false: (a: array[0..1] of AnsiChar);
>> true : (w: WideChar);
>> end;
>> begin
>> s:='';
>> while not Eof(F) do begin
>> System.Read(F,c.a[0]);
>> System.Read(F,c.a[1]);
>> if c.w in [#10,#13] then
>> if s = '' then {begin of line} else break {end of line}
>> else
>> s := s + c.w;
>> end;
>> end;
>>
>> which works for me also, but I would be like to have better solution.
>> I will try LoadFromFile with TEncoding once FPC 3.2 will be out.
>>
>> -L.
>>
>>> Stupid an lazy workaround, probably not suitable for larger files.
>>>
>>> {$mode objfpc}
>>> {$h+}
>>> uses
>>> sysutils;
>>>
>>> type
>>> TUCS2TextFile = file of WideChar;
>>>
>>> procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString);
>>> var
>>> WC: WideChar;
>>> begin
>>> //Assume file is opend for read
>>> S := '';
>>> while not Eof(F) do
>>> begin
>>> Read(F, WC);
>>> if WC = WideChar(#$000A) then
>>> exit
>>> else
>>> if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE
>>> BOM})) then S := S + WC;
>>> end;
>>> end;
>>>
>>> var
>>> UFile: TUCS2TextFile;
>>> US: UnicodeString;
>>> begin
>>> AssignFile(UFile, 'ucs2.txt');
>>> Reset(Ufile);
>>> while not Eof(UFile) do
>>> begin
>>> ReadLine(UFile, US);
>>> writeln('US = ',US);
>>> end;
>>> CloseFile(UFile);
>>> end.
>>>
>>> Outputs
>>> US = Line1
>>> US = Line2
>>> US = Line3
>>> which is correct for my test file (Unicode LE encoding created with
>>> Notepad).
>>>
>>> --
>>> Bart
>>> _______________________________________________
>>> fpc-pascal maillist - fpc-pascal at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>> _______________________________________________
>> fpc-pascal maillist - fpc-pascal at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>>
> _______________________________________________
> fpc-pascal maillist - fpc-pascal at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
More information about the fpc-pascal
mailing list