[fpc-pascal] Split stream into words

Michael Van Canneyt michael at freepascal.org
Tue Jul 3 15:26:14 CEST 2018



On Tue, 3 Jul 2018, Marcos Douglas B. Santos wrote:

> On Tue, Jul 3, 2018 at 7:50 AM, Michael Van Canneyt
> <michael at freepascal.org> wrote:
>>
>> On Tue, 3 Jul 2018, Marco van de Voort wrote:
>> Trivial indeed, till you need more fine-grained control.
>> e.g. C needs to be an array of chars that mark word boundaries etc.
>>
>> But I managed to solve the problem with regexps...
>
> How?

I misunderstood how Split works. The regex is the 'word separator' in that
function.

The following correctly gives me all words. unit uregexp is the regexp unit
compiled for unicode.

Michael.

--------------

{$mode objfpc}
{$H+}
uses cwstring, sysutils, classes, uregexpr;

Var
   Split : TStringList;
   S : String;
   R : TRegexpr;
   E : TEncoding;

begin
   Split:=TStringList.Create;
   E:=TEncoding.UTF8;
   Split.LoadFromFile(ParamStr(1),E);
   S:=Split.Text;
   r := TRegExpr.Create;
   try
     r.spaceChars:=r.spaceChars+'|&@#"''(ยง^!{})-[]*%`=+/.;:,?';
     r.LineSeparators:=#10;
     r.Expression :='(\b[^\d\s]+\b)';
     if R.Exec(S) then
        REPEAT
        Writeln('Found: ',System.Copy (S, R.MatchPos [0], R.MatchLen[0]));
        UNTIL not R.ExecNext;
   finally
     r.Free;
   end;
end.


More information about the fpc-pascal mailing list