[fpc-pascal]OT: Syntax Highlighting

Thu Nov 8 23:37:20 CET 2001

At 11/8/01 08:28 AM, you wrote:

Jim,

 > their. So, my question is this; does anyone know of another method for
 > syntax highlighting other then using pos() on each line? Is there a way to
 > make pos() faster? Is there something better then pos() to check if a 
string
 > resides within another string?

You should stick to simple keyword highlighting in the beginning, real
syntax highlighting is a very complicated business, requiring theoretical
and practical familiarity with many concepts used in compiler design.

But even only to find keywords, pos() or anything similar is a completely
impractical approach. What you have to do is to write a lexical scanner to
scan your input and to turn it into tokens. The details also depend on the
language you want to display but, basically, it is as follows: you have a
character variable to keep the next character not yet consumed. You write a
function to return the next token in the input (an abstract value like
identifier, number, punctuation, etc). This NextToken function examines the
current-character and decides what to do:

Literal := ''
if end of input stream reached then
   return END_TOKEN
else if current-character = whitespace then
   repeat
     append current-character to Literal
     read next character into current-character
   until current-character is not whitespace
   return WHITESPACE_TOKEN
else if current-character = alphabetic then
   repeat
     append current-character to Literal
     read next character into current-character
   until current-character is not alphabetic
   return KEYWORD_TOKEN
else if current-character = numeric then
   repeat
     append current-character to Literal
     read next character into current-character
   until current-character is not numeric
   return NUMBER_TOKEN
else if current-character = quote then
   repeat
     append current-character to Literal
     read next character into current-character
   until current-character is quote
   append current-character to Literal
   read next character into current-character
   return QUOTE_TOKEN
else
   append current-character to Literal
   read next character into current-character
   return OTHER_TOKEN

So, for each call of NextToken, you receive back a xxx_TOKEN value
describing what kind of token you have just found in the input stream, plus
the actual keyword or number in the Literal string. You always have to take
care of leaving this function with the next character of the input already
read in and stored into current-character so that the next call of NextToken
will have something to work on (this is called one character look-ahead).
Before you call NextToken for the first time (or, if you scan each line
separately, at the beginning of each line), you have to read the first
character into current-character yourself.

read first character into current-character
change to Normal color
repeat
   Token := NextToken
   if Token = KEYWORD_TOKEN then
     if Literal is a keyword then
       change to Keyword color
   else if Token = NUMBER_TOKEN then
     change to Number color
   else if Token = QUOTE_TOKEN then
     change to Quote color
   write Literal
   change back to Normal color
until Token = END_TOKEN

When you receive a KEYWORD_TOKEN, you already know that you have a potential
keyword in Literal. Now you have to check whether it is one of the actual
keywords. Two data structures are practically useful for this, a sorted
table of keywords with binary search or a hash table. In the first phase of
writing this program, you could simply use TSortedCollection to store your
keywords and to look them up. It won't be optimal in terms of speed but it
is already written and easy to use. Once it works, you can think about
replacing it with a faster solution.

Bye,
    Gábor

-------------------------------------------------------------------
Gabor DEAK JAHN -- Budapest, Hungary.
E-mail: djg at tramontana.co.hu