[fpc-pascal] Last missing benchmark: regex-dna

Marco van de Voort marcov at stack.nl
Fri Oct 6 12:32:52 CEST 2006


> > Although fpc has a regexpr unit:
> > http://svn.freepascal.org/svn/fpc/trunk/packages/base/regexpr/regexpr.pp
> > It has many todos, such as adding support for | in the search expression. So this 
> > unit doesn't have enough functionality.
> 
> While '|' support is to be considered basic regex functionality, what is the really expected functionality?
> 
> Basic seems to be: |()?*+ (non-UNICODE) support (from wikipedia).

| is not basic afaik. From re_format BSD Manpage:

     Obsolete (``basic'') regular expressions differ in several respects.  `|'
     is an ordinary character and there is no equivalent for its functional-
     ity.  `+' and `?' are ordinary characters, and their functionality can
     be expressed using bounds (`{1,}' or `{0,1}' respectively).  Also note
     that `x+' in modern REs is equivalent to `xx*'.  The delimiters for
     bounds are `\{' and `\}', with `{' and `}' by themselves ordinary
     characters.  The parentheses for nested subexpressions are `\(' and
     `\)', with `(' and `)' by themselves ordinary characters.  `^' is an
     ordinary character except at the beginning of the RE or= the beginning
     of a parenthesized subex- pression, `$' is an ordinary character except
     at the end of the RE or= the end of a parenthesized subexpression, and
     `*' is an ordinary charac- ter if it appears at the beginning of the RE
     or the beginning of a paren- thesized subexpression (after a possible
     leading `^').  Finally, there is one new type of atom, a back
     reference: `\' followed by a non-zero deci- mal digit d matches the
     same sequence of characters matched by the dth parenthesized
     subexpression (numbering subexpressions by the positions of their
     opening parentheses, left to right), so that (e.g.) `\([bc]\)\1'
     matches `bb' or `cc' but not `bc'.




More information about the fpc-pascal mailing list