[fpc-devel] Including Sorokin's TRegExpr in FPC

Michael Van Canneyt michael at freepascal.org
Tue Aug 2 11:58:38 CEST 2011



On Tue, 2 Aug 2011, Marco van de Voort wrote:

> In our previous episode, Michael Van Canneyt said:
>>>> I knew I recognised the name of the author. This code is already used in
>>>> lazarus and can be found in components/synedit/synregexpr.pas. I had to work
>>>> on it because the original code doesn't run on cpu's requiring alignment.
>>>> The patch is attached at http://bugs.freepascal.org/view.php?id=19109.
>>>>
>>>
>>> Still, this is good news, in Lazarus the license is MPL or GPL, now it
>>> can be modified LGPL.
>>
>> If Florian agrees (if I'm correct, he wrote the old unit), we can move the
>> old regexpr to oldregexpr, and move this one into its place.
>
> There are more contenders
> Yesterday, on IRC, sb (Rosseaux) offered a native regex unit with PCRE constructs:
>
> 19:52 < rosseaux> https://scm.fluktuation.net/svn/brre/ a feature-complete
> (with mostly all known regexp features from
>                  perl/pcre/etc.) and Unicode8.0-conform-and-UTF8-capable
> (as optional work mode in addition to the
>                  ansichar-bytewise-mode) bytecode-based regular expression
> engine for object pascal, it has two subengines, a
>                  backtracking NFA and a parallel threaded NFA (also called
> lazy-computed DFA), both engines are cascaded in
>                  each another, so that ReDO
> 19:52 < rosseaux> S attacks are still possible but not more so easy
> exploitable as like in some other regex engines. It's
>                  licensed under the LGPL with static-linking-exception.
> 20:00 < rosseaux> and it includes shift-or and boyer-moore (if >32
> subsearchs for static simple regex patterns
>
> 20:46 < fpk> rosseaux: did you do any speed comparisations?
> 20:46 < rosseaux> not yet
> 20:47 < rosseaux> only feature comparsion tests, but i'll do it in the next
> days
> 20:49 < fpk> nice work :
> 20:49 < fpk> )
> 20:49 < rosseaux> the parallel threaded non-backtracking NFA idea is based
> on the http://swtch.com/~rsc/regexp/regexp1.html
>                  article, which I've found with google months ago,
> 20:49 < rosseaux> thanks :)
> 20:52 < rosseaux> the UTF8 decoder stuff in BRRE is also a DFA machine and
> is position-state-hold-based until the whle regexp
>                  stuff is in the process, so the UTF8 support should be
> faster than in some other regexp engines with
>                  non-complete UTF8 support just as PCRE and so on.
> 20:55 < rosseaux> and a already-done-compiled BRRE regex can be used in
> multiple CPU threads at the same time, so it's
>                  semi-threadsafe in this sense.
> 20:56 < rosseaux> so https://anonymous@scm.fluktuation.net/svn/brre/  that
> should working now
>
>
> That being said, there is probably room for two packages.

Hmmm. Yes.

But the units would have to be named differently anyhow, 
to avoid the mess like we had with the apache headers.

In order to avoid future nameclashes, I would propose to prefix 
the 'native FPC' one with 'fp'.

Which one that is, is largely irrelevant to me. 
I haven't had the need for regular expressions yet :)

Michael.



More information about the fpc-devel mailing list