Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add known upper limit to capture search. #217

Merged
merged 1 commit into from
Apr 27, 2016
Merged

Conversation

BurntSushi
Copy link
Member

The DFA will report the end location of a match, so we should pass that
along to capture detection. In theory, the DFA and the NFA report the
same match locations, so this upper bound shouldn't be necessary---the
NFA should quit once it finds the right match. It turns out though
bounding the text has two important ramifications:

  1. It will enable the backtracking engine to be used more often. In
    particular, the backtracking engine can only be used on small inputs and
    this change decreases the size of the input by only considering the
    match.
  2. The backtracking engine must start every search by zeroing memory
    that is proportional to the size of the input. If the input is smaller,
    then this runs more quickly.

We are also careful to bound the match to one additional "character"
past the end of the match, so that lookahead operators work correctly.

See also: #215.

self.captures_nfa(
MatchNfaType::Auto, slots, text, s)
MatchNfaType::Auto, slots, &text[..e], s)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this change also be applied above in the MatchType::Literal case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And below in the DfaAnchoredReverse case?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch on the literal case. For anchored reverse it won't help because if there's a match, it's guaranteed to end at the input (since the case only happens when the regex is anchored to the end of the text).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks!

The DFA will report the end location of a match, so we should pass that
along to capture detection. In theory, the DFA and the NFA report the
same match locations, so this upper bound shouldn't be necessary---the
NFA should quit once it finds the right match. It turns out though
bounding the text has two important ramifications:

1. It will enable the backtracking engine to be used more often. In
particular, the backtracking engine can only be used on small inputs and
this change decreases the size of the input by only considering the
match.
2. The backtracking engine must start every search by zeroing memory
that is proportional to the size of the input. If the input is smaller,
then this runs more quickly.

We are also careful to bound the match to one additional "character"
past the end of the match, so that lookahead operators work correctly.
@BurntSushi BurntSushi merged commit 3f408e5 into master Apr 27, 2016
@BurntSushi BurntSushi deleted the fix-capture-perf branch April 27, 2016 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants