[fpc-pascal] Split stream into words
Michael Van Canneyt
michael at freepascal.org
Tue Jul 3 15:26:14 CEST 2018
On Tue, 3 Jul 2018, Marcos Douglas B. Santos wrote:
> On Tue, Jul 3, 2018 at 7:50 AM, Michael Van Canneyt
> <michael at freepascal.org> wrote:
>>
>> On Tue, 3 Jul 2018, Marco van de Voort wrote:
>> Trivial indeed, till you need more fine-grained control.
>> e.g. C needs to be an array of chars that mark word boundaries etc.
>>
>> But I managed to solve the problem with regexps...
>
> How?
I misunderstood how Split works. The regex is the 'word separator' in that
function.
The following correctly gives me all words. unit uregexp is the regexp unit
compiled for unicode.
Michael.
--------------
{$mode objfpc}
{$H+}
uses cwstring, sysutils, classes, uregexpr;
Var
Split : TStringList;
S : String;
R : TRegexpr;
E : TEncoding;
begin
Split:=TStringList.Create;
E:=TEncoding.UTF8;
Split.LoadFromFile(ParamStr(1),E);
S:=Split.Text;
r := TRegExpr.Create;
try
r.spaceChars:=r.spaceChars+'|&@#"''(ยง^!{})-[]*%`=+/.;:,?';
r.LineSeparators:=#10;
r.Expression :='(\b[^\d\s]+\b)';
if R.Exec(S) then
REPEAT
Writeln('Found: ',System.Copy (S, R.MatchPos [0], R.MatchLen[0]));
UNTIL not R.ExecNext;
finally
r.Free;
end;
end.
More information about the fpc-pascal
mailing list