[fpc-pascal] Example: regular expressions and "hash-tables"
S. Fisher
expandafter at yahoo.com
Thu Mar 21 02:14:59 CET 2013
Not actually a hash-table, but an AvgLvlTree, which can be used the
same way. The AvgLvlTree unit comes with Lazarus; if you don't have
that, you can download avglvltree.pas here:
http://svn.freepascal.org/cgi-bin/viewvc.cgi/trunk/components/lazutils/avglvltree.pas?root=lazarus&view=log
There doesn't seem to be a lot of information available on using
regular expressions and hash-tables in Free Pascal, so I decided to
post this.
The program reads a text file and counts the number of unique words,
and also displays the number of times the most common word was found.
Since TStringToPointerTree actually maps strings to pointers, I wrote
some tiny utility functions to handle the casting from integers to
pointers and back.
Although I was worried that this tree-structure would be slower than a
hash-table, it turned out to be quite fast.
{$mode objfpc}
uses
AvgLvlTree,
regexpr,
classes,
dateutils,
Sysutils;
const
file_to_process = 'Bartlett--Quotations.txt';
type
// string-integer table
t_si_table = TStringToPointerTree;
t_si_item = PStringToPointerItem;
procedure si_table_put( table: t_si_table; k: string; v: longint );
begin
table[ k ] := pointer( v )
end;
function si_table_get( table : t_si_table; key : string ): longint;
begin
exit( longint( table[ key ] ) )
end;
procedure si_table_inc( table: t_si_table; k: string );
begin
table[ k ] := pointer( si_table_get( table, k ) + 1 ) ;
end;
function si_table_key( item : t_si_item ): string;
begin
exit( item^.name )
end;
function si_table_val( item : t_si_item ): longint;
begin
exit( longint( item^.value ) )
end;
function si_table_new( sensitive : boolean ): t_si_table;
begin
exit( t_si_table.create( sensitive ) )
end;
var
table : t_si_table ;
table_item : t_si_item ;
line, word : string;
re : TRegExpr;
max_count, count : int32;
lines : TStringList;
time_1 : TDateTime;
begin
time_1 := time;
re := TRegExpr.Create;
re.Expression := '(?i)[a-z]+' ; // case-insensitive
table := si_table_new( true ) ; // true to make it case-sensitive.
lines := TStringList.Create;
lines.LoadFromFile( file_to_process );
for line in lines do begin
if re.exec( line ) then begin
si_table_inc( table, re.match[0] );
while re.execNext do
si_table_inc( table, re.match[0] );
end;
end;
lines.free;
writeln( 'Unique word count: ', table.count );
max_count := 0;
for table_item in table do begin
count := si_table_val( table_item );
if count > max_count then begin
word := si_table_key( table_item );
max_count := count
end;
end;
writeln( 'Commonest word: ', word, ' (', max_count, ')' );
writeln( 'Elapsed time: ',
milliSecondsBetween( time_1, time ),
' milliseconds.' );
table.free;
re.free
end.
Processing the file "Bartlett--Quotations.txt", which is about
319 kilobytes in size, produces this output:
Unique word count: 8911
Commonest word: the (2200)
Elapsed time: 94 milliseconds.
Documentation for TRegExpr can be found here:
http://regexpstudio.com/TRegExpr/Help/tregexpr_interface.html
More information about the fpc-pascal
mailing list