[fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

LacaK lacak at zoznam.sk
Wed Sep 4 13:39:24 CEST 2019


> You may be able to improve on this using system.BlockRead.
Probably yes, but then I must read in local buffer and examine buffer 
for CR/LF.

And return from my function UCS2ReadLn() only portion of string up to 
CR/LF and rest of string return on next call to my function.
(so I must keep unprocessed part in global buffer)


>
> Also, you are assuming low order byte first which may not be portable.

Yes, In my case LE is sufficient as far as I check presence of BOM $FF$FE

L.

>
> On 04/09/2019 11:14, LacaK wrote:
>> Nice! Thank you very much.
>>
>> As an alternative for F:TextFile I am using:
>>
>> procedure UCS2ReadLn(var F: TextFile; out s: String);
>> var
>>   c: record
>>       case boolean of
>>        false: (a: array[0..1] of AnsiChar);
>>        true : (w: WideChar);
>>      end;
>> begin
>>   s:='';
>>   while not Eof(F) do begin
>>     System.Read(F,c.a[0]);
>>     System.Read(F,c.a[1]);
>>     if c.w in [#10,#13] then
>>       if s = '' then {begin of line} else break {end of line}
>>     else
>>       s := s + c.w;
>>   end;
>> end;
>>
>> which works for me also, but I would be like to have better solution. 
>> I will try LoadFromFile with TEncoding once FPC 3.2 will be out.
>>
>> -L.
>>
>>> Stupid an lazy workaround, probably not suitable for larger files.
>>>
>>> {$mode objfpc}
>>> {$h+}
>>> uses
>>>    sysutils;
>>>
>>> type
>>>    TUCS2TextFile = file of WideChar;
>>>
>>> procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString);
>>> var
>>>    WC: WideChar;
>>> begin
>>>    //Assume file is opend for read
>>>    S := '';
>>>    while not Eof(F) do
>>>    begin
>>>      Read(F, WC);
>>>      if WC = WideChar(#$000A) then
>>>        exit
>>>      else
>>>        if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE
>>> BOM})) then S := S + WC;
>>>    end;
>>> end;
>>>
>>> var
>>>    UFile: TUCS2TextFile;
>>>    US: UnicodeString;
>>> begin
>>>    AssignFile(UFile, 'ucs2.txt');
>>>    Reset(Ufile);
>>>    while not Eof(UFile) do
>>>    begin
>>>      ReadLine(UFile, US);
>>>      writeln('US = ',US);
>>>    end;
>>>    CloseFile(UFile);
>>> end.
>>>
>>> Outputs
>>> US = Line1
>>> US = Line2
>>> US = Line3
>>> which is correct for my test file (Unicode LE encoding created with 
>>> Notepad).
>>>
>>> -- 
>>> Bart
>>> _______________________________________________
>>> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>> _______________________________________________
>> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
>>
> _______________________________________________
> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


More information about the fpc-pascal mailing list