[fpc-pascal]Word count function
James_Wilson at i2.com
James_Wilson at i2.com
Mon Oct 1 16:34:00 CEST 2001
Gabor;
> It just happened that I needed this function myself, and I found
> two errors in my pseudo-code. Here it is, corrected:
> [snip]
Your algorithm is similar to what I had devised initially, but I was not
happy with the performance. In my case I was using it on files that could
(conceivably) be 20+ meg. Some of those files were taking 3-5 minutes for
the calculations (even with a 933mhz PIII). That, I felt, was
unacceptable.
Basically, I had a const that was my 10 or 12 delimiter characters and I
would then use the pos() function to see if each character of every string
was a delimiter. Seemed like a good idea at the time, but it was very
slow.
What I found to be faster was to do something like this:
for each character of the string check to see if it's an upper or lower
case letter or number;
if TRUE, keep counting;
if FALSE, do pos() on the delimiters const
if TRUE, it's the end of a word -- add to word count
if FALSE, it's not a word -- keep looking
Here's an idea of what I had done...
Function GetWords (StringToCheck : string) : longint;
const
DELIMITERS = ' .,!?_-)}]>;:=@/\#9';
var
Index : longint;
LineLength : longint;
Loop : longint;
Words : longint;
CurrentChar : char;
begin
Words := 0;
Index := 0;
LineLength := length (StringToCheck);
if LineLength <> 0 then // don't check empty srings
while Index < LineLength do
begin
inc (Index);
CurrentChar := StringToCheck [Index];
while (Index < LineLength) and ((CurrentChar >= 'a') and
(CurrentChar <= 'z')) and
((CurrentChar >= 'A') and (CurrentChar <= 'Z')) and
((CurrentChar >= '0') and (CurrentChar <= '9'))
do inc (Index); // skip all the "word" characters
// don't count double delims, like a period followed by a space, as
2 words
if pos (StringToCheck [succ (Index)],DELIMITERS) <> 0 then
begin
inc (Words);
inc (Index); // move index past current delim char
end;
end;
GetWords := Words;
end;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20011001/0f716894/attachment.html>
More information about the fpc-pascal
mailing list