[fpc-pascal] Re: Example: regular expressions and "hash-tables"
S. Fisher
expandafter at yahoo.com
Sat Mar 23 04:25:30 CET 2013
--- On Fri, 3/22/13, Mattias Gaertner <nc-gaertnma at netcologne.de> wrote:
> From: Mattias Gaertner <nc-gaertnma at netcologne.de>
> Subject: Re: [fpc-pascal] Re: Example: regular expressions and "hash-tables"
> To: fpc-pascal at lists.freepascal.org
> Date: Friday, March 22, 2013, 4:11 AM
> On Fri, 22 Mar 2013 01:19:17 -0700
> (PDT)
> "S. Fisher" <expandafter at yahoo.com>
> wrote:
>
> > --- On Thu, 3/21/13, Reinier Olislagers <reinierolislagers at gmail.com>
> wrote:
> >
> > > From: Reinier Olislagers <reinierolislagers at gmail.com>
> > > Subject: [fpc-pascal] Re: Example: regular
> expressions and "hash-tables"
> > > To: "FPC Mailing list" <fpc-pascal at lists.freepascal.org>
> > > Date: Thursday, March 21, 2013, 5:35 AM
> > > On 21-3-2013 2:14, S. Fisher wrote:
> > > > Not actually a hash-table, but an AvgLvlTree,
> which can
> > > be used the
> > > > same way. The AvgLvlTree unit comes with
> Lazarus;
> > > if you don't have
> > > > that, you can download avglvltree.pas here:
> > > >
> > > > http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/avglvltree.pas?root=lazarus&view=log
> > > >
> > > > There doesn't seem to be a lot of information
> available
> > > on using
> > > > regular expressions and hash-tables in Free
> Pascal, so
> > > I decided to
> > > > post this.
> > >
> > > If I'm not mistaken, see:
> > > http://wiki.lazarus.freepascal.org/AVL_Tree
> > >
> > > Thanks,
> > > Reinier
> >
> > Yes, that was one of the very few good references that
> I found.
>
> Thanks. It's a wiki
>
>
> > There doesn't seem to be much demand for hash-tables in
> the Free
> > Pascal community. Hash-tables can be very
> helpful.
>
> There are some hash tables in units "contnrs",
> "dynhasharray" and
> "stringhashlist".
http://www.freepascal.org/docs-html/fcl/contnrs/index-4.html
I can't seem to find "dyn" on this page.
TFPStringHashTable won't work for my program because it maps strings
to strings, not to pointers or numbers. (I wonder if it has an
enumerator.)
I already experimented with TFPDataHashTable. It won't work for
this program, because, incredibly, there is no way to iterate over
the entries! The people who write these classes need to have a
wider experience with using hash-tables. (Become fluent in the tiny
language Awk, for example.)
That's why it seemed that I had to download a file from Lazarus
and compile it in order to have a good associative array that
maps strings to numbers (or pointers).
So, thanks a lot for creating avglvltree. It does the job perfectly.
>
>
> > I don't
> > see a good way to do what my program does without using
> associative
> > arrays of some type.
>
> Me too.
>
>
> > Also, there doesn't seem to be much use of regular
> expressions.
>
> Well, I guess Pascal programmers prefer readability over
> shortness.
>
I think we can agree 100% that 70-character-long regular
expressions are nightmarish and should definitely be avoided if at
all possible! Trying to decipher one of those could almost drive
you crazy.
Regular expressions are an absolute necessity for some applications.
The program "grep", for example. The user can't be expected or
allowed to write a routine in C or Pascal that selects certain
lines, but he can be allowed to supply a regular expression that
does that. The same is true for text editors. Any halfway decent
editor will let you find a line that contains "foo" followed at some
distance by "bar" by searching for "foo.*bar". (Even Microsoft Word
lets you do a r.e. search, although they dishonestly call it
something else.)
I know that you're saying that a regexpr engine can't do what a
full-blown parser can. Very true. But for some tasks they work
very well.
Let's say we want to find some dates in a string. The dates will
be like either of these patterns:
yyyy-mm-dd
yyyy/mm/dd
In the r.e., \d represents a digit, and [...] is a character class
or set. So the r.e. could be
\d\d\d\d[-/]\d\d[-/]\d\d
Testing that on this string
'92011-05-22 1999/12/22--2012-02-28; 2000/09/033 2011-07-05::1988/04/06'
we get
2011-05-22
1999/12/22
2012-02-28
2000/09/03
2011-07-05
1988/04/06
Now let's say that we don't want to include the dates that have an
extra digit before the year or after the day. A non-digit character
is represented by \D, and the beginning of the string is matched
by ^. Alternation or "or" is indicated by |. So we prefix this to
the r.e.:
(^|\D)
It says that the date must be at the very beginning of the string or
it must be preceded by a non-digit. A $ will match only at the end of
the string, so we append this to the r.e.:
(\D|$)
It says that the date must be followed by a non-digit or must be at
the very end of the string. At this point the r.e. looks like this:
(^|\D)\d\d\d\d[-/]\d\d[-/]\d\d(\D|$)
And the ouput is this:
1999/12/22-
-2012-02-28;
2011-07-05:
:1988/04/06
The delimiting characters are now included in the match. To solve
this, we take advantage of the fact that parentheses in the r.e.
not only create a group, but also make a "capture" of what they
enclose. So we put them around the part of the pattern that we
want to keep, yielding this:
'(^|\D)(\d\d\d\d[-/]\d\d[-/]\d\d)(\D|$)'
Since there are 3 pairs of parentheses in the r.e., there are
3 captures. We want the second one. So we change the program
from
writeln( re.match[0] );
which prints the entire substring that was matched, to
writeln( re.match[2] );
which prints the second capture. The ouput is now
1999/12/22
2012-02-28
2011-07-05
1988/04/06
Just as it's easier to climb a cliff than to descend it, it's
easier to write a regular expression than to read it. So don't
worry too much about that aspect.
Remember that a small r.e. can sometimes do the work of quite a few
lines of code.
{$mode objfpc}
{$H+}
uses regexpr;
const date_data =
'92011-05-22 1999/12/22--2012-02-28; 2000/09/033 2011-07-05::1988/04/06';
var
re : TRegExpr;
begin
re := TRegExpr.Create;
re.Expression := '\d\d\d\d[-/]\d\d[-/]\d\d';
if re.exec( date_data ) then begin
writeln( re.match[0] );
while re.execNext do
writeln( re.match[0] )
end;
writeln;
re.Expression := '(^|\D)\d\d\d\d[-/]\d\d[-/]\d\d(\D|$)';
if re.exec( date_data ) then begin
writeln( re.match[0] );
while re.execNext do
writeln( re.match[0] )
end;
writeln;
re.Expression := '(^|\D)(\d\d\d\d[-/]\d\d[-/]\d\d)(\D|$)';
if re.exec( date_data ) then begin
writeln( re.match[2] );
while re.execNext do
writeln( re.match[2] )
end;
writeln;
re.free
end.
More information about the fpc-pascal
mailing list