[fpc-pascal] Re: Example: regular expressions and "hash-tables"

Noah Silva shiruba at galapagossoftware.com
Tue Apr 2 04:35:16 CEST 2013


Hi,

I had an entire class on using Regular Expressions (in Java).  They
actually only look scary or hard to read until you get used to them.
 Debugging them can be difficult sometimes though.  If you search, there
are some web pages available that let you check them online in real time.
 Even though I am fairly expert with them in Java, I don't use them much in
Pascal, though.

Thank you,
    Noah Silva

2013/3/23 S. Fisher <expandafter at yahoo.com>

> --- On Fri, 3/22/13, Mattias Gaertner <nc-gaertnma at netcologne.de> wrote:
>
> > From: Mattias Gaertner <nc-gaertnma at netcologne.de>
> > Subject: Re: [fpc-pascal] Re: Example: regular expressions and
> "hash-tables"
> > To: fpc-pascal at lists.freepascal.org
> > Date: Friday, March 22, 2013, 4:11 AM
> > On Fri, 22 Mar 2013 01:19:17 -0700
> > (PDT)
> > "S. Fisher" <expandafter at yahoo.com>
> > wrote:
> >
> > > --- On Thu, 3/21/13, Reinier Olislagers <reinierolislagers at gmail.com>
> > wrote:
> > >
> > > > From: Reinier Olislagers <reinierolislagers at gmail.com>
> > > > Subject: [fpc-pascal] Re: Example: regular
> > expressions and "hash-tables"
> > > > To: "FPC Mailing list" <fpc-pascal at lists.freepascal.org>
> > > > Date: Thursday, March 21, 2013, 5:35 AM
> > > > On 21-3-2013 2:14, S. Fisher wrote:
> > > > > Not actually a hash-table, but an AvgLvlTree,
> > which can
> > > > be used the
> > > > > same way.  The AvgLvlTree unit comes with
> > Lazarus;
> > > > if you don't have
> > > > > that, you can download avglvltree.pas here:
> > > > >
> > > > >
> http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/avglvltree.pas?root=lazarus&view=log
> > > > >
> > > > > There doesn't seem to be a lot of information
> > available
> > > > on using
> > > > > regular expressions and hash-tables in Free
> > Pascal, so
> > > > I decided to
> > > > > post this.
> > > >
> > > > If I'm not mistaken, see:
> > > > http://wiki.lazarus.freepascal.org/AVL_Tree
> > > >
> > > > Thanks,
> > > > Reinier
> > >
> > > Yes, that was one of the very few good references that
> > I found.
> >
> > Thanks. It's a wiki
> >
> >
> > > There doesn't seem to be much demand for hash-tables in
> > the Free
> > > Pascal community.  Hash-tables can be very
> > helpful.
> >
> > There are some hash tables in units "contnrs",
> > "dynhasharray" and
> > "stringhashlist".
>
>
> http://www.freepascal.org/docs-html/fcl/contnrs/index-4.html
>
> I can't seem to find "dyn" on this page.
>
> TFPStringHashTable won't work for my program because it maps strings
> to strings, not to pointers or numbers.  (I wonder if it has an
> enumerator.)
>
> I already experimented with TFPDataHashTable.  It won't work for
> this program, because, incredibly, there is no way to iterate over
> the entries!  The people who write these classes need to have a
> wider experience with using hash-tables.  (Become fluent in the tiny
> language Awk, for example.)
>
> That's why it seemed that I had to download a file from Lazarus
> and compile it in order to have a good associative array that
> maps strings to numbers (or pointers).
>
> So, thanks a lot for creating avglvltree. It does the job perfectly.
>
> >
> >
> > > I don't
> > > see a good way to do what my program does without using
> > associative
> > > arrays of some type.
> >
> > Me too.
> >
> >
> > > Also, there doesn't seem to be much use of regular
> > expressions.
> >
> > Well, I guess Pascal programmers prefer readability over
> > shortness.
> >
>
> I think we can agree 100% that 70-character-long regular
> expressions are nightmarish and should definitely be avoided if at
> all possible!  Trying to decipher one of those could almost drive
> you crazy.
>
> Regular expressions are an absolute necessity for some applications.
> The program "grep", for example.  The user can't be expected or
> allowed to write a routine in C or Pascal that selects certain
> lines, but he can be allowed to supply a regular expression that
> does that. The same is true for text editors.  Any halfway decent
> editor will let you find a line that contains "foo" followed at some
> distance by "bar" by searching for "foo.*bar".  (Even Microsoft Word
> lets you do a r.e. search, although they dishonestly call it
> something else.)
>
> I know that you're saying that a regexpr engine can't do what a
> full-blown parser can.  Very true.  But for some tasks they work
> very well.
>
> Let's say we want to find some dates in a string.  The dates will
> be like either of these patterns:
>
>   yyyy-mm-dd
>   yyyy/mm/dd
>
> In the r.e., \d represents a digit, and [...] is a character class
> or set.  So the r.e. could be
>
>   \d\d\d\d[-/]\d\d[-/]\d\d
>
> Testing that on this string
>
>  '92011-05-22 1999/12/22--2012-02-28; 2000/09/033 2011-07-05::1988/04/06'
>
> we get
>
> 2011-05-22
> 1999/12/22
> 2012-02-28
> 2000/09/03
> 2011-07-05
> 1988/04/06
>
> Now let's say that we don't want to include the dates that have an
> extra digit before the year or after the day.  A non-digit character
> is represented by \D, and the beginning of the string is matched
> by ^.  Alternation or "or" is indicated by |.  So we prefix this to
> the r.e.:
>
>   (^|\D)
>
> It says that the date must be at the very beginning of the string or
> it must be preceded by a non-digit.  A $ will match only at the end of
> the string, so we append this to the r.e.:
>
>   (\D|$)
>
> It says that the date must be followed by a non-digit or must be at
> the very end of the string.  At this point the r.e. looks like this:
>
>   (^|\D)\d\d\d\d[-/]\d\d[-/]\d\d(\D|$)
>
> And the ouput is this:
>
>  1999/12/22-
> -2012-02-28;
>  2011-07-05:
> :1988/04/06
>
> The delimiting characters are now included in the match.  To solve
> this, we take advantage of the fact that parentheses in the r.e.
> not only create a group, but also make a "capture" of what they
> enclose.  So we put them around the part of the pattern that we
> want to keep, yielding this:
>
>  '(^|\D)(\d\d\d\d[-/]\d\d[-/]\d\d)(\D|$)'
>
> Since there are 3 pairs of parentheses in the r.e., there are
> 3 captures.  We want the second one.  So we change the program
> from
>
>   writeln( re.match[0] );
>
> which prints the entire substring that was matched, to
>
>   writeln( re.match[2] );
>
> which prints the second capture.  The ouput is now
>
> 1999/12/22
> 2012-02-28
> 2011-07-05
> 1988/04/06
>
>
> Just as it's easier to climb a cliff than to descend it, it's
> easier to write a regular expression than to read it.  So don't
> worry too much about that aspect.
>
> Remember that a small r.e. can sometimes do the work of quite a few
> lines of code.
>
> {$mode objfpc}
> {$H+}
>
> uses regexpr;
>
> const date_data =
>   '92011-05-22 1999/12/22--2012-02-28; 2000/09/033 2011-07-05::1988/04/06';
>
> var
>   re : TRegExpr;
>
> begin
>   re := TRegExpr.Create;
>
>   re.Expression := '\d\d\d\d[-/]\d\d[-/]\d\d';
>   if re.exec( date_data ) then  begin
>     writeln( re.match[0] );
>     while re.execNext do
>       writeln( re.match[0] )
>   end;
>   writeln;
>
>   re.Expression := '(^|\D)\d\d\d\d[-/]\d\d[-/]\d\d(\D|$)';
>   if re.exec( date_data ) then  begin
>     writeln( re.match[0] );
>     while re.execNext do
>       writeln( re.match[0] )
>   end;
>   writeln;
>
>   re.Expression := '(^|\D)(\d\d\d\d[-/]\d\d[-/]\d\d)(\D|$)';
>   if re.exec( date_data ) then  begin
>     writeln( re.match[2] );
>     while re.execNext do
>       writeln( re.match[2] )
>   end;
>   writeln;
>
>   re.free
> end.
>
> _______________________________________________
> fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20130402/73fbe32f/attachment.html>


More information about the fpc-pascal mailing list