[fpc-pascal] fast text processing
Florian Klaempfl
florian at freepascal.org
Wed Oct 31 15:36:13 CET 2007
Vincent Snijders schrieb:
> Jeff Pohlmeyer schreef:
>>>> this kludge is about 25% faster than your perl script
>>>> on my machine....
>>
>>> Nope. It's still more or less twice slower. :-D
>>
>>
>> I guess it depends on the hardware:
>>
>> % time koleksi.pl # perl
>> Word count: 126944
>> Unique word count: 11793
>>
>> real 0m1.019s
>> user 0m0.992s
>> sys 0m0.028s
>>
>>
>> % time koleksi # fpc
>> Word count:126944
>> Unique word count:11793
>>
>> real 0m0.817s
>> user 0m0.784s
>> sys 0m0.020s
>>
>>
>> AMD-K6-700 / SuSE-10.3 / Linux-2.6.22 / perl-5.8.8 / fpc-2.2.0
>>
>>
>
> Thanks Jeff, for writing that parser code, I am not good in doing that.
>
> I made it three times as fast on my computer (windows 2000, fpc 2.3.1,
> P4 1.5 Ghz) using a hashlist for the unique word count. Using a larger
> textbuf gave an additional 10% speed up:
>
> program project1;
> {$MODE OBJFPC} {$H+}
>
> uses classes, strings, contnrs;
>
> const
> bufsize = $1FFF;
>
> var
> f: text;
> s:ansistring;
> wc:longint=0;
> wl:TStringList;
> uhl: TFPStringHashTable;
> i,n:LongInt;
> textbuf: array[0..bufsize-1] of byte;
>
> begin
> assign(f, 'Koleksi.dat');
> reset(f);
> SetTextBuf(f, textbuf, sizeof(textbuf));
> wl:=TStringList.Create();
> uhl:=TFPStringHashTable.Create;
> while not eof(f) do begin
> readln(f,s);
> n:=length(s);
> if (n>0) then begin
> StrLower(@s[1]);
> if (s[1]='<') then begin
> if StrLComp(@s[1], '<title>',7) = 0 then begin
> delete(s,1,7);
> end else continue;
> end;
> for i:=1 to n do if not (s[i] in ['a'..'z','0'..'9']) then begin
> if ( s[i] <> '<' ) then begin
> s[i]:=#10
> end else begin
> s[i]:=#0;
> SetLength(s,StrLen(@s[1]));
Why not SetLength(s,i)? StrLen is _very_ expensive. I don't see a way
how another #0 can be before.
More information about the fpc-pascal
mailing list