Re[2]: [fpc-pascal] Last missing benchmark: regex-dna
Пётр Косаревский
ppkk at mail.ru
Fri Oct 6 14:18:03 CEST 2006
> > Basic seems to be: |()?*+ (non-UNICODE) support (from wikipedia).
> | is not basic afaik. From re_format BSD Manpage:
> Obsolete (``basic'') regular expressions differ in several respects. `|'
> is an ordinary character and there is no equivalent for its functional-
> ity. `+' and `?' are ordinary characters, and their functionality can
> be expressed using bounds (`{1,}' or `{0,1}' respectively). Also note
> that `x+' in modern REs is equivalent to `xx*'. The delimiters for
> bounds are `\{' and `\}', with `{' and `}' by themselves ordinary
> characters. The parentheses for nested subexpressions are `\(' and
> `\)', with `(' and `)' by themselves ordinary characters. `^' is an
> ordinary character except at the beginning of the RE or= the beginning
> of a parenthesized subex- pression, `$' is an ordinary character except
> at the end of the RE or= the end of a parenthesized subexpression, and
> `*' is an ordinary charac- ter if it appears at the beginning of the RE
> or the beginning of a paren- thesized subexpression (after a possible
> leading `^'). Finally, there is one new type of atom, a back
> reference: `\' followed by a non-zero deci- mal digit d matches the
> same sequence of characters matched by the dth parenthesized
> subexpression (numbering subexpressions by the positions of their
> opening parentheses, left to right), so that (e.g.) `\([bc]\)\1'
> matches `bb' or `cc' but not `bc'.
You know the traditional Unix syntax, I quoted wikipedia's definition of "basic regex", and it looks like a simplified POSIX ERE (extended regular expression) syntax ( http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html ).
I've got the idea.
The "last missing benchmark" requires simplest bracket expressions like "[agt]" (equiv. to "(a|g|t)")). Bracketing is supported in FPC "regexpr" enough.
Also it requires substitution (find and replace all) of a regex in a string, which has nothing to do with regex standards.
More information about the fpc-pascal
mailing list