[fpc-pascal] regex unit and word boundaries

Ben Smith ben.smith.lists at gmail.com
Wed Apr 6 10:43:55 CEST 2011


On Wed, Apr 6, 2011 at 9:32 AM, ik wrote:
>
> What's wrong with /a word/ (without the slash) ?

Sorry, I don't understand.


> But if the word can exists in a middle of a text, and you do not look for a
> pattern, then regex is not what you should use, but "pos" instead. Because
> pos is more efficient then regex.

I'm implementing syntax highlighting in one of my text edit
components. I am using regex to find keywords, reserved words etc to
highlight.

So when I search for 'class', it must not match 'Classes', so I can't
use something as rudimentary as Pos().

Normally you enable word boundaries in your regex as follows, to
accomplish what I need:

     \b(class|record|begin|end)\b

alternatively (if \b is not available) I can do something like....

     ^\s*end;?\s*$

which will match 'end;' and 'end' but not something like 'amend'

But alas, it seems the FCL regex unit doesn't have \s (any whitespace
chars) implemented either. It seems the FCL regex unit is in its
infancy, and I need a more feature complete implementation of regex.

I did some Google'ing and TRegExpr class library by Andrey V. Sorokin
seems a lot more feature complete and free. I will probably have to
switch to using that component. A shame really, because I like to
stick to using units included with FPC.  Doing some more searching, I
believe Lazarus IDE also uses the TRegExpr unit, instead of FCL's one.

Is anybody still working on the FCL regex unit? Are there plans to
implement any more regex features, or is that unit abandon-ware. Can't
the FPC developers include TRegexpr library as part of FCL? It will
save them a lot of development effort.


-- 

              Ben.



More information about the fpc-pascal mailing list