[fpc-pascal]OT: Syntax Highlighting
Gabor DEAK JAHN
djg at tramontana.co.hu
Thu Nov 8 23:37:20 CET 2001
At 11/8/01 08:28 AM, you wrote:
Jim,
> their. So, my question is this; does anyone know of another method for
> syntax highlighting other then using pos() on each line? Is there a way to
> make pos() faster? Is there something better then pos() to check if a
string
> resides within another string?
You should stick to simple keyword highlighting in the beginning, real
syntax highlighting is a very complicated business, requiring theoretical
and practical familiarity with many concepts used in compiler design.
But even only to find keywords, pos() or anything similar is a completely
impractical approach. What you have to do is to write a lexical scanner to
scan your input and to turn it into tokens. The details also depend on the
language you want to display but, basically, it is as follows: you have a
character variable to keep the next character not yet consumed. You write a
function to return the next token in the input (an abstract value like
identifier, number, punctuation, etc). This NextToken function examines the
current-character and decides what to do:
Literal := ''
if end of input stream reached then
return END_TOKEN
else if current-character = whitespace then
repeat
append current-character to Literal
read next character into current-character
until current-character is not whitespace
return WHITESPACE_TOKEN
else if current-character = alphabetic then
repeat
append current-character to Literal
read next character into current-character
until current-character is not alphabetic
return KEYWORD_TOKEN
else if current-character = numeric then
repeat
append current-character to Literal
read next character into current-character
until current-character is not numeric
return NUMBER_TOKEN
else if current-character = quote then
repeat
append current-character to Literal
read next character into current-character
until current-character is quote
append current-character to Literal
read next character into current-character
return QUOTE_TOKEN
else
append current-character to Literal
read next character into current-character
return OTHER_TOKEN
So, for each call of NextToken, you receive back a xxx_TOKEN value
describing what kind of token you have just found in the input stream, plus
the actual keyword or number in the Literal string. You always have to take
care of leaving this function with the next character of the input already
read in and stored into current-character so that the next call of NextToken
will have something to work on (this is called one character look-ahead).
Before you call NextToken for the first time (or, if you scan each line
separately, at the beginning of each line), you have to read the first
character into current-character yourself.
read first character into current-character
change to Normal color
repeat
Token := NextToken
if Token = KEYWORD_TOKEN then
if Literal is a keyword then
change to Keyword color
else if Token = NUMBER_TOKEN then
change to Number color
else if Token = QUOTE_TOKEN then
change to Quote color
write Literal
change back to Normal color
until Token = END_TOKEN
When you receive a KEYWORD_TOKEN, you already know that you have a potential
keyword in Literal. Now you have to check whether it is one of the actual
keywords. Two data structures are practically useful for this, a sorted
table of keywords with binary search or a hash table. In the first phase of
writing this program, you could simply use TSortedCollection to store your
keywords and to look them up. It won't be optimal in terms of speed but it
is already written and easy to use. Once it works, you can think about
replacing it with a faster solution.
Bye,
Gábor
-------------------------------------------------------------------
Gabor DEAK JAHN -- Budapest, Hungary.
E-mail: djg at tramontana.co.hu
More information about the fpc-pascal
mailing list