Hi,<div><br></div><div>I had an entire class on using Regular Expressions (in Java). They actually only look scary or hard to read until you get used to them. Debugging them can be difficult sometimes though. If you search, there are some web pages available that let you check them online in real time. Even though I am fairly expert with them in Java, I don't use them much in Pascal, though.</div>
<div><br></div><div>Thank you,</div><div> Noah Silva<br><br><div class="gmail_quote">2013/3/23 S. Fisher <span dir="ltr"><<a href="mailto:expandafter@yahoo.com" target="_blank">expandafter@yahoo.com</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">--- On Fri, 3/22/13, Mattias Gaertner <<a href="mailto:nc-gaertnma@netcologne.de">nc-gaertnma@netcologne.de</a>> wrote:<br>
<br>
> From: Mattias Gaertner <<a href="mailto:nc-gaertnma@netcologne.de">nc-gaertnma@netcologne.de</a>><br>
> Subject: Re: [fpc-pascal] Re: Example: regular expressions and "hash-tables"<br>
> To: <a href="mailto:fpc-pascal@lists.freepascal.org">fpc-pascal@lists.freepascal.org</a><br>
> Date: Friday, March 22, 2013, 4:11 AM<br>
<div><div class="h5">> On Fri, 22 Mar 2013 01:19:17 -0700<br>
> (PDT)<br>
> "S. Fisher" <<a href="mailto:expandafter@yahoo.com">expandafter@yahoo.com</a>><br>
> wrote:<br>
><br>
> > --- On Thu, 3/21/13, Reinier Olislagers <<a href="mailto:reinierolislagers@gmail.com">reinierolislagers@gmail.com</a>><br>
> wrote:<br>
> ><br>
> > > From: Reinier Olislagers <<a href="mailto:reinierolislagers@gmail.com">reinierolislagers@gmail.com</a>><br>
> > > Subject: [fpc-pascal] Re: Example: regular<br>
> expressions and "hash-tables"<br>
> > > To: "FPC Mailing list" <<a href="mailto:fpc-pascal@lists.freepascal.org">fpc-pascal@lists.freepascal.org</a>><br>
> > > Date: Thursday, March 21, 2013, 5:35 AM<br>
> > > On 21-3-2013 2:14, S. Fisher wrote:<br>
> > > > Not actually a hash-table, but an AvgLvlTree,<br>
> which can<br>
> > > be used the<br>
> > > > same way. The AvgLvlTree unit comes with<br>
> Lazarus;<br>
> > > if you don't have<br>
> > > > that, you can download avglvltree.pas here:<br>
> > > ><br>
> > > > <a href="http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/avglvltree.pas?root=lazarus&view=log" target="_blank">http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/avglvltree.pas?root=lazarus&view=log</a><br>
> > > ><br>
> > > > There doesn't seem to be a lot of information<br>
> available<br>
> > > on using<br>
> > > > regular expressions and hash-tables in Free<br>
> Pascal, so<br>
> > > I decided to<br>
> > > > post this.<br>
> > ><br>
> > > If I'm not mistaken, see:<br>
> > > <a href="http://wiki.lazarus.freepascal.org/AVL_Tree" target="_blank">http://wiki.lazarus.freepascal.org/AVL_Tree</a><br>
> > ><br>
> > > Thanks,<br>
> > > Reinier<br>
> ><br>
> > Yes, that was one of the very few good references that<br>
> I found.<br>
><br>
> Thanks. It's a wiki<br>
><br>
><br>
> > There doesn't seem to be much demand for hash-tables in<br>
> the Free<br>
> > Pascal community. Hash-tables can be very<br>
> helpful. <br>
><br>
> There are some hash tables in units "contnrs",<br>
> "dynhasharray" and<br>
> "stringhashlist".<br>
<br>
<br>
</div></div><a href="http://www.freepascal.org/docs-html/fcl/contnrs/index-4.html" target="_blank">http://www.freepascal.org/docs-html/fcl/contnrs/index-4.html</a><br>
<br>
I can't seem to find "dyn" on this page.<br>
<br>
TFPStringHashTable won't work for my program because it maps strings<br>
to strings, not to pointers or numbers. (I wonder if it has an<br>
enumerator.)<br>
<br>
I already experimented with TFPDataHashTable. It won't work for<br>
this program, because, incredibly, there is no way to iterate over<br>
the entries! The people who write these classes need to have a<br>
wider experience with using hash-tables. (Become fluent in the tiny<br>
language Awk, for example.)<br>
<br>
That's why it seemed that I had to download a file from Lazarus<br>
and compile it in order to have a good associative array that<br>
maps strings to numbers (or pointers).<br>
<br>
So, thanks a lot for creating avglvltree. It does the job perfectly.<br>
<div class="im"><br>
><br>
><br>
> > I don't<br>
> > see a good way to do what my program does without using<br>
> associative<br>
> > arrays of some type.<br>
><br>
> Me too.<br>
><br>
><br>
> > Also, there doesn't seem to be much use of regular<br>
> expressions.<br>
><br>
> Well, I guess Pascal programmers prefer readability over<br>
> shortness.<br>
><br>
<br>
</div>I think we can agree 100% that 70-character-long regular<br>
expressions are nightmarish and should definitely be avoided if at<br>
all possible! Trying to decipher one of those could almost drive<br>
you crazy.<br>
<br>
Regular expressions are an absolute necessity for some applications.<br>
The program "grep", for example. The user can't be expected or<br>
allowed to write a routine in C or Pascal that selects certain<br>
lines, but he can be allowed to supply a regular expression that<br>
does that. The same is true for text editors. Any halfway decent<br>
editor will let you find a line that contains "foo" followed at some<br>
distance by "bar" by searching for "foo.*bar". (Even Microsoft Word<br>
lets you do a r.e. search, although they dishonestly call it<br>
something else.)<br>
<br>
I know that you're saying that a regexpr engine can't do what a<br>
full-blown parser can. Very true. But for some tasks they work<br>
very well.<br>
<br>
Let's say we want to find some dates in a string. The dates will<br>
be like either of these patterns:<br>
<br>
yyyy-mm-dd<br>
yyyy/mm/dd<br>
<br>
In the r.e., \d represents a digit, and [...] is a character class<br>
or set. So the r.e. could be<br>
<br>
\d\d\d\d[-/]\d\d[-/]\d\d<br>
<br>
Testing that on this string<br>
<br>
'92011-05-22 1999/12/22--2012-02-28; 2000/09/033 2011-07-05::1988/04/06'<br>
<br>
we get<br>
<br>
2011-05-22<br>
1999/12/22<br>
2012-02-28<br>
2000/09/03<br>
2011-07-05<br>
1988/04/06<br>
<br>
Now let's say that we don't want to include the dates that have an<br>
extra digit before the year or after the day. A non-digit character<br>
is represented by \D, and the beginning of the string is matched<br>
by ^. Alternation or "or" is indicated by |. So we prefix this to<br>
the r.e.:<br>
<br>
(^|\D)<br>
<br>
It says that the date must be at the very beginning of the string or<br>
it must be preceded by a non-digit. A $ will match only at the end of<br>
the string, so we append this to the r.e.:<br>
<br>
(\D|$)<br>
<br>
It says that the date must be followed by a non-digit or must be at<br>
the very end of the string. At this point the r.e. looks like this:<br>
<br>
(^|\D)\d\d\d\d[-/]\d\d[-/]\d\d(\D|$)<br>
<br>
And the ouput is this:<br>
<br>
1999/12/22-<br>
-2012-02-28;<br>
2011-07-05:<br>
:1988/04/06<br>
<br>
The delimiting characters are now included in the match. To solve<br>
this, we take advantage of the fact that parentheses in the r.e.<br>
not only create a group, but also make a "capture" of what they<br>
enclose. So we put them around the part of the pattern that we<br>
want to keep, yielding this:<br>
<br>
'(^|\D)(\d\d\d\d[-/]\d\d[-/]\d\d)(\D|$)'<br>
<br>
Since there are 3 pairs of parentheses in the r.e., there are<br>
3 captures. We want the second one. So we change the program<br>
from<br>
<br>
writeln( re.match[0] );<br>
<br>
which prints the entire substring that was matched, to<br>
<br>
writeln( re.match[2] );<br>
<br>
which prints the second capture. The ouput is now<br>
<br>
1999/12/22<br>
2012-02-28<br>
2011-07-05<br>
1988/04/06<br>
<br>
<br>
Just as it's easier to climb a cliff than to descend it, it's<br>
easier to write a regular expression than to read it. So don't<br>
worry too much about that aspect.<br>
<br>
Remember that a small r.e. can sometimes do the work of quite a few<br>
lines of code.<br>
<br>
{$mode objfpc}<br>
{$H+}<br>
<br>
uses regexpr;<br>
<br>
const date_data =<br>
'92011-05-22 1999/12/22--2012-02-28; 2000/09/033 2011-07-05::1988/04/06';<br>
<div class="im"><br>
var<br>
re : TRegExpr;<br>
<br>
begin<br>
re := TRegExpr.Create;<br>
<br>
</div> re.Expression := '\d\d\d\d[-/]\d\d[-/]\d\d';<br>
if re.exec( date_data ) then begin<br>
writeln( re.match[0] );<br>
while re.execNext do<br>
writeln( re.match[0] )<br>
end;<br>
writeln;<br>
<br>
re.Expression := '(^|\D)\d\d\d\d[-/]\d\d[-/]\d\d(\D|$)';<br>
if re.exec( date_data ) then begin<br>
writeln( re.match[0] );<br>
while re.execNext do<br>
writeln( re.match[0] )<br>
end;<br>
writeln;<br>
<br>
re.Expression := '(^|\D)(\d\d\d\d[-/]\d\d[-/]\d\d)(\D|$)';<br>
if re.exec( date_data ) then begin<br>
writeln( re.match[2] );<br>
while re.execNext do<br>
writeln( re.match[2] )<br>
end;<br>
writeln;<br>
<br>
re.free<br>
end.<br>
<div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
fpc-pascal maillist - <a href="mailto:fpc-pascal@lists.freepascal.org">fpc-pascal@lists.freepascal.org</a><br>
<a href="http://lists.freepascal.org/mailman/listinfo/fpc-pascal" target="_blank">http://lists.freepascal.org/mailman/listinfo/fpc-pascal</a><br>
</div></div></blockquote></div><br></div>