[fpc-pascal] fast text processing
Vincent Snijders
vsnijders at quicknet.nl
Wed Oct 31 11:46:59 CET 2007
Jeff Pohlmeyer schreef:
>>> this kludge is about 25% faster than your perl script
>>> on my machine....
>
>> Nope. It's still more or less twice slower. :-D
>
>
> I guess it depends on the hardware:
>
> % time koleksi.pl # perl
> Word count: 126944
> Unique word count: 11793
>
> real 0m1.019s
> user 0m0.992s
> sys 0m0.028s
>
>
> % time koleksi # fpc
> Word count:126944
> Unique word count:11793
>
> real 0m0.817s
> user 0m0.784s
> sys 0m0.020s
>
>
> AMD-K6-700 / SuSE-10.3 / Linux-2.6.22 / perl-5.8.8 / fpc-2.2.0
>
>
Thanks Jeff, for writing that parser code, I am not good in doing that.
I made it three times as fast on my computer (windows 2000, fpc 2.3.1, P4 1.5 Ghz)
using a hashlist for the unique word count. Using a larger textbuf gave an
additional 10% speed up:
program project1;
{$MODE OBJFPC} {$H+}
uses classes, strings, contnrs;
const
bufsize = $1FFF;
var
f: text;
s:ansistring;
wc:longint=0;
wl:TStringList;
uhl: TFPStringHashTable;
i,n:LongInt;
textbuf: array[0..bufsize-1] of byte;
begin
assign(f, 'Koleksi.dat');
reset(f);
SetTextBuf(f, textbuf, sizeof(textbuf));
wl:=TStringList.Create();
uhl:=TFPStringHashTable.Create;
while not eof(f) do begin
readln(f,s);
n:=length(s);
if (n>0) then begin
StrLower(@s[1]);
if (s[1]='<') then begin
if StrLComp(@s[1], '<title>',7) = 0 then begin
delete(s,1,7);
end else continue;
end;
for i:=1 to n do if not (s[i] in ['a'..'z','0'..'9']) then begin
if ( s[i] <> '<' ) then begin
s[i]:=#10
end else begin
s[i]:=#0;
SetLength(s,StrLen(@s[1]));
break;
end;
end;
wl.Text:=s;
for i:=0 to wl.Count-1 do begin
s:=wl[i];
for n:=1 to length(s) do if (s[n] in ['0'..'9']) then begin
s:='';
break;
end;
if (s<>'') then begin
inc(wc);
if uhl.Find(s) = nil then
uhl.Add(s,'');
end;
end;
end;
end;
close(f);
WriteLn('Word count:',wc, #10'Unique word count:', uhl.Count);
end.
More information about the fpc-pascal
mailing list