Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

search: add support for counting individual matches (Fixes #566) #814

Closed
wants to merge 2 commits into from
Closed

search: add support for counting individual matches (Fixes #566) #814

wants to merge 2 commits into from

Conversation

balajisivaraman
Copy link
Contributor

@balajisivaraman balajisivaraman commented Feb 17, 2018

As discussed in #566, this PR adds a --count-matches flag that will count the individual matches instead of the matching lines.

  • This does essentially the same thing printer.matched does for printing only matches, which is to use the Regex's find_iter method for getting the individual matches. We then increment a variable to keep track of the count, ignoring the actual match itself. Since this will always happen behind a flag, I hope this won't have any performance implications.

  • I initially considered using the existing match_count variable in search_stream and search_buffer to keep track of individual matches if --count-matches were passed in. This would've been handy if we later wanted to use the individual match count for something like the --stats flag, as it would get returned all the way up to main.rs; we can then use this for our purposes.

    However, I decided against this because it conflicts with the existing --max-count argument, which terminates the search early if we've hit the max count of matching lines. As a result, rg still terminates for matching line count and not individual match count even after this change. I'm just throwing this out there for your consideration.

    Currently, I've renamed the existing match_count to matching_line_count to better reflect what it keeps track of. match_count is now an Option<usize> and used to keep track of individual match count.

  • -c/--count and --count-matches will override each other.

Copy link
Owner

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@balajisivaraman This looks great to me! I left some comments but I think they are all pretty minor!

@@ -21,7 +21,8 @@ pub struct BufferSearcher<'a, W: 'a> {
grep: &'a Grep,
path: &'a Path,
buf: &'a [u8],
match_count: u64,
matching_line_count: u64,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you name this match_line_count so that it's more consistent with match_count?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the future (no need to do this for this PR), it would be super helpful to put renames (and similarly uninteresting changes) in a separate commit. In this case, a commit that came before the principle change. The reason is that I don't really care about the mechanics of a name change so long as the compiler is happy, but it increases the noise in the diff among other changes that I do carefully want to look at.

It can be hard to do this though because it does require some forethought, so don't feel like I'm imposing a strong requirement! But if you think of it and it's not too much work, it definitely helps the review process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this does indeed make a lot of sense to me. I understand completely and will make the change in a separate commit for this PR also. No issues. :-) Good housekeeping is always welcome.

if self.opts.count && self.matching_line_count > 0 {
self.printer.path_count(self.path, self.matching_line_count);
} else if self.opts.count_matches
&& self.match_count.map_or(false, |c| c > 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When wrapping a conditional to multiple lines, could you move the brace down to the next line so that the division between the conditional and the conditional body is more clear?

let mut last_end = 0;
for m in self.grep.iter(self.buf) {
if self.opts.invert_match {
self.print_inverted_matches(last_end, m.start());
} else {
self.print_match(m.start(), m.end());
self.count_individual_matches(m.start(), m.end());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move this call into print_match itself?

@@ -291,6 +305,7 @@ impl<'a, R: io::Read, W: WriteColor> Searcher<'a, R, W> {
self.print_after_context(start);
self.print_before_context(start);
self.print_match(start, end);
self.count_individual_matches(start, end);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the mmap searcher. Can you move this into the print_match method?

@BurntSushi
Copy link
Owner

@balajisivaraman OK, I finally got a chance to merge this today in 27fc9f2. All I did was fix up the conflicts with latest master and reformatted the commit message. (Commit messages should note the issues they close.) I also added an additional commit that causes --count --only-matching to behave the same as --count-matches. (The next release of ripgrep will be 0.9.0, so this is OK.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants