[fpc-pascal] fast text processing
Bee
bisma at brawijaya.ac.id
Wed Oct 31 06:19:39 CET 2007
> Give us a test case (some example source code) and I will beat the living crap
> out of any perl script. Perl is built using Cee, so anything Perl can do Cee can
> do better.. which means Pascal can do better or similar. Perl is not written in
> Perl. In other words, perl is just a wrapper around the Cee language.. a similar
> effect can be done by making terse procedural wrappers around verbose pascal
> units.
Quite expected response. :)
Here is the Perl code:
--->8--- begin perl code --->8---
open(FH, "Koleksi.dat") or die("Koleksi.dat: $!");
$/ = "</DOC>\n";
while (<FH>) {
while (m/<(TITLE|TEXT)>(.+)<\/\1>/gcs) {
$str = lc $2;
while ($str =~ m/\b([a-z]+)\b/gcs) {
$arr_kata{$1}++;
$jum_kata++;
}
}
}
print "Word count: $jum_kata\n";
print "Unique word count: ", scalar keys %arr_kata, "\n";
--->8--- end perl code --->8---
The compressed "Koleksi.dat" file can be obtained from
http://wawan.web.id/text/Koleksi.zip (957 KB). It's actually an HTML
document taken from somewhere. Using time command as simple profiler,
resulting:
bee at ubuntu:~$ time perl koleksi.perl
Word count: 126944
Unique word count: 11793
real 0m0.203s
user 0m0.196s
sys 0m0.004s
Tested on Ubuntu (gutsy) with on IBM x60 (1.8 GHz processor and 1 GB
memory).
The pascal counter-part resulting almost twice slower. Though not as
simple as Perl, the pascal code is quite simple and only using standar
fpc's units. But, I won't post the code here to not influence your logic. ;)
> Without any code showing what it is that is 'slow' in pascal, your post is
> meaningless.. ;-)
Just try to beat it. I expect you'll only use basic or default fpc's
units. :)
> It's all about wrappers. Perl is just a wrapper. Pascal can do wrappers too.
> That's why webwrite() is so easy to use, for example. Encapsulation even works
> well even with procedures, not just objects. Perl contains a lot of quick and
> dirty procedures and syntax that are mapped to Cee procedures. That's all.
Understood. But newbies or newcomers won't write the wrappers by
themselves in the first place unless it's already provided by the tool
(fpc).
-Bee-
has Bee.ography at:
http://beeography.wordpress.com
More information about the fpc-pascal
mailing list