[fpc-devel] Including Sorokin's TRegExpr in FPC
Marco van de Voort
marcov at stack.nl
Tue Aug 2 11:21:15 CEST 2011
In our previous episode, Michael Van Canneyt said:
> >> I knew I recognised the name of the author. This code is already used in
> >> lazarus and can be found in components/synedit/synregexpr.pas. I had to work
> >> on it because the original code doesn't run on cpu's requiring alignment.
> >> The patch is attached at http://bugs.freepascal.org/view.php?id=19109.
> >>
> >
> > Still, this is good news, in Lazarus the license is MPL or GPL, now it
> > can be modified LGPL.
>
> If Florian agrees (if I'm correct, he wrote the old unit), we can move the
> old regexpr to oldregexpr, and move this one into its place.
There are more contenders
Yesterday, on IRC, sb (Rosseaux) offered a native regex unit with PCRE constructs:
19:52 < rosseaux> https://scm.fluktuation.net/svn/brre/ a feature-complete
(with mostly all known regexp features from
perl/pcre/etc.) and Unicode8.0-conform-and-UTF8-capable
(as optional work mode in addition to the
ansichar-bytewise-mode) bytecode-based regular expression
engine for object pascal, it has two subengines, a
backtracking NFA and a parallel threaded NFA (also called
lazy-computed DFA), both engines are cascaded in
each another, so that ReDO
19:52 < rosseaux> S attacks are still possible but not more so easy
exploitable as like in some other regex engines. It's
licensed under the LGPL with static-linking-exception.
20:00 < rosseaux> and it includes shift-or and boyer-moore (if >32
subsearchs for static simple regex patterns
20:46 < fpk> rosseaux: did you do any speed comparisations?
20:46 < rosseaux> not yet
20:47 < rosseaux> only feature comparsion tests, but i'll do it in the next
days
20:49 < fpk> nice work :
20:49 < fpk> )
20:49 < rosseaux> the parallel threaded non-backtracking NFA idea is based
on the http://swtch.com/~rsc/regexp/regexp1.html
article, which I've found with google months ago,
20:49 < rosseaux> thanks :)
20:52 < rosseaux> the UTF8 decoder stuff in BRRE is also a DFA machine and
is position-state-hold-based until the whle regexp
stuff is in the process, so the UTF8 support should be
faster than in some other regexp engines with
non-complete UTF8 support just as PCRE and so on.
20:55 < rosseaux> and a already-done-compiled BRRE regex can be used in
multiple CPU threads at the same time, so it's
semi-threadsafe in this sense.
20:56 < rosseaux> so https://anonymous@scm.fluktuation.net/svn/brre/ that
should working now
That being said, there is probably room for two packages.
More information about the fpc-devel
mailing list