[fpc-pascal] fast text processing

Bee bisma at brawijaya.ac.id
Wed Oct 31 06:19:39 CET 2007


> Give us a test case (some example source code) and I will beat the living crap
> out of any perl script. Perl is built using Cee, so anything Perl can do Cee can
> do better.. which means Pascal can do better or similar. Perl is not written in
> Perl. In other words, perl is just a wrapper around the Cee language.. a similar
> effect can be done by making terse procedural wrappers around verbose pascal
> units.

Quite expected response. :)

Here is the Perl code:

--->8--- begin perl code --->8---

open(FH, "Koleksi.dat") or die("Koleksi.dat: $!");
$/ = "</DOC>\n";
while (<FH>) {
   while (m/<(TITLE|TEXT)>(.+)<\/\1>/gcs) {
     $str = lc $2;
     while ($str =~ m/\b([a-z]+)\b/gcs) {
       $arr_kata{$1}++;
       $jum_kata++;
     }
   }
}
print "Word count: $jum_kata\n";
print "Unique word count: ", scalar keys %arr_kata, "\n";

--->8--- end perl code --->8---

The compressed "Koleksi.dat" file can be obtained from 
http://wawan.web.id/text/Koleksi.zip (957 KB). It's actually an HTML 
document taken from somewhere. Using time command as simple profiler, 
resulting:

bee at ubuntu:~$ time perl koleksi.perl

Word count: 126944
Unique word count: 11793

real    0m0.203s
user    0m0.196s
sys     0m0.004s

Tested on Ubuntu (gutsy) with on IBM x60 (1.8 GHz processor and 1 GB 
memory).

The pascal counter-part resulting almost twice slower. Though not as 
simple as Perl, the pascal code is quite simple and only using standar 
fpc's units. But, I won't post the code here to not influence your logic. ;)

> Without any code showing what it is that is 'slow' in pascal, your post is
> meaningless.. ;-)

Just try to beat it. I expect you'll only use basic or default fpc's 
units. :)

> It's all about wrappers. Perl is just a wrapper. Pascal can do wrappers too.
> That's why webwrite() is so easy to use, for example. Encapsulation even works
> well even with procedures, not just objects. Perl contains a lot of quick and
> dirty procedures and syntax that are mapped to Cee procedures. That's all.

Understood. But newbies or newcomers won't write the wrappers by 
themselves in the first place unless it's already provided by the tool 
(fpc).

-Bee-

has Bee.ography at:
http://beeography.wordpress.com




More information about the fpc-pascal mailing list