-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected (or incorrect?) multiline replace behavior #1311
Comments
Actually, this has to be a mere oversight because the following does work:
and I can even trick rg into giving me the behavior I'm looking for with this:
(although that fails when there are empty lines in the input) |
My understanding is that You might feel |
Yes, and that's why I would like |
This is likely a bug in how new lines are handled by the printer. I don't know when I'll get around to it. |
Thanks for the confirmation anyway, @BurntSushi! |
Just found this bug myself. Here's some more weirdness in case it points to the source of the bug.
|
This commit fixes a subtle bug in multi-line replacement of line terminators. The problem is that even though ripgrep supports multi-line searches, it is *still* line oriented. It still needs to print line numbers, for example. For this reason, there are various parts in the printer that iterate over lines in order to format them into the desired output. This turns out to be problematic in some cases. #1311 documents one of those cases (with line numbers enabled to highlight a point later): $ printf "hello\nworld\n" | rg -n -U "\n" -r "?" 1:hello? 2:world? But the desired output is this: $ printf "hello\nworld\n" | rg -n -U "\n" -r "?" 1:hello?world? At first I had thought that the main problem was that the printer was taking ownership of writing line terminators, even if the input already had them. But it's more subtle than that. If we fix that issue, we get output like this instead: $ printf "hello\nworld\n" | rg -n -U "\n" -r "?" 1:hello?2:world? Notice how '2:' is printed before 'world?'. The reason it works this way is because matches are reported to the printer in a line oriented way. That is, the printer gets a block of lines. The searcher guarantees that all matches that start or end in any of those lines also end or start in another line in that same block. As a result, the printer uses this assumption: once it has processed a block of lines, the next match will begin on a new and distinct line. Thus, things like '2:' are printed. This is generally all fine and good, but an impedance mismatch arises when replacements are used. Because now, the replacement can be used to change the "block of lines" approach. Now, in terms of the output, the subsequent match might actually continue the current line since the replacement might get rid of the concept of lines altogether. We can sometimes work around this. For example: $ printf "hello\nworld\n" | rg -U "\n(.)?" -r '?$1' hello?world? Why does this work? It's because the '(.)' after the '\n' causes the match to overlap between lines. Thus, the searcher guarantees that the block sent to the printer contains every line. And there in lay the solution: all we need to do is tweak the multi-line searcher so that it combines lines with matches that directly adjacent, instead of requiring at least one byte of overlap. Fixing that solves the issue above. It does cause some tests to fail: * The binary3 test in the searcher crate fails because adjacent line matches are now one part of block, and that block is scanned for binary data. To preserve the essence of the test, we insert a couple dummy lines to split up the blocks. * The JSON CRLF test. It was testing that we didn't output any messages with an empty 'submatches' array. That is indeed still the case. The difference is that the messages got combined because of the adjacent line merging behavior. This is a slight change to the output, but is still correct. Fixes #1311
Hi @BurntSushi! Just wanted to thank you for fixing this. I have just encountered this very bug, and it is very pleasing to see "closed 16 days ago"! 👍 |
What version of ripgrep are you using?
ripgrep 11.0.1 (rev 34677d2)
+SIMD -AVX (compiled)
+SIMD +AVX (runtime)
How did you install ripgrep?
cargo from git
What operating system are you using ripgrep on?
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
Describe your question, feature request, or bug.
I was trying to replace some of my usages of other commands (
sed
,tr
,ag
) with ripgrep now that it has proper multiline and pcre2 support features, and ran into the following odd behavior:This works fine, everything matches, nothing is highlighted because the newline can't be colored.
But when I try to replace, I get the following behavior:
which isn't correct because the newline has been correctly matched but incorrectly retained in the output.
Whereas the following does work:
But I don't see any documented reason why the former shouldn't give the same output as well. I can understand technically why this is happening (rg internally preserves the concept of lines but merely glosses over them for search purposes with
-U
), but the confluence of both-U
and-r
results in incorrect behavior: the output should either include the replacement character and no new lines (desirable) or the newlines and no replacement character (unfortunate but understandable limitation). I don't think including both can be considered correct, however.The text was updated successfully, but these errors were encountered: