-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect match behavior on $ #557
Comments
I believe your analysis on what's expected is correct. Here's a much smaller reproduction: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=30dfc1d0a4d9c158c2dfe55fed32331b The first and third results are correct, but the second one is not. I do believe there is a duplicate bug for this, but I'd want to confirm the root cause first. It could be a while before this gets fixed. |
To your test cases I would add:
which does not match. So the presence of the trailing |
Hi. Lines 873 to 887 in 60d087a
which in exec.rs As an example, Execute the following sample code: In case of example, capture of the submatch is performed for the range of the matching result after performing matching in the DFA. Lines 563 to 575 in 60d087a
Matching against $ is done when capture the submatches, but this process is to judge whether it is the end of the character string passed to Line 188 in 60d087a
Also, #334 has been reported as an issue related to the relevant part. This issue is already closed, but it can be confirmed that even if the regular expression is slightly changed like this: This seems to be the same cause as the issue of this issue. To solve this problem, I feel that it is necessary to use the length of the original text when processing EndLine and EndText |
When performing "EndText" matching, it is necessary to check whether the current position matches the input text length. However, when capturing a submatch using the matching result of DFA, "EndText" matching wasn't actually performed correctly because the input character string sliced. This patch resolve this problem by specifying the end position of the capture target match by the argument "end", not using slice when performing capture with the matching result of DFA. Fixes rust-lang#557
When performing "EndText" matching, it is necessary to check whether the current position matches the input text length. However, when capturing a submatch using the matching result of DFA, "EndText" matching wasn't actually performed correctly because the input character string sliced. This patch resolve this problem by specifying the match end position by the argument "end", not using slice when performing capture with the matching result of DFA. Fixes rust-lang#557
When performing "EndText" matching, it is necessary to check whether the current position matches the input text length. However, when capturing a submatch using the matching result of DFA, "EndText" matching wasn't actually performed correctly because the input text sliced. By applying this patch we specify the match end position by the argument "end", not using slice when performing capture with the matching result of DFA. Fixes rust-lang#557
When performing "EndText" matching, it is necessary to check whether the current position matches the input text length.However, when capturing a submatch using the matching result of DFA, "EndText" matching wasn't actually performed correctly because the input text is sliced. By applying this patch we specify the match end position by the argument "end", not using slice when performing capture with the matching result of DFA. Fixes rust-lang#557
When performing "EndText" matching, it is necessary to check whether the current position matches the input text length. However, when capturing a submatch using the matching result of DFA, "EndText" matching wasn't actually performed correctly because the input text is sliced. By applying this patch we specify the match end position by the argument "end", not using slice when performing capture with the matching result of DFA. Fixes #557, Closes #561
@BurntSushi Thank you for reviewing and merging! |
This fixes a bug introduced by a bug fix for #557. In particular, the termination condition wasn't exactly right, and this appears to have slipped through the test suite. This probably reveals a hole in our test suite, which is specifically the testing of Unicode regexes with bytes::Regex on invalid UTF-8. This bug was originally reported against ripgrep: BurntSushi/ripgrep#1247
Rust version:
This might be a duplicate of some other already-open bugs, but it's hard to guess at root causes from symptoms.
Consider the regex
/(aa$)?/
and the input "aaz", using a partial match.The docs say that
$
should match$ the end of text (or end-of-line with multi-line mode)
.The final
?
means that the regex engine can partial-match any string --- either strings that end in "aa" (capturing "aa") or any string (capturing nothing).Since the input "aaz" does not end in "aa":
Here's the kernel of my test program:
This is the behavior on the regex and input described above:
In this case, Rust is unique among the 8 languages I tried. Perl, PHP, Java, Ruby, Go, JavaScript (Node-V8), and Python all match with an empty/null capture.
The text was updated successfully, but these errors were encountered: