-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Implement filter_out()
#7775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement filter_out()
#7775
Conversation
ea193ab to
3155f84
Compare
BECAUSE THIS IS HARD Y'ALL
| * New experimental `filter_out()` companion to `filter()`. | ||
|
|
||
| * Use `filter()` when specifying rows to _keep_. | ||
|
|
||
| * Use `filter_out()` when specifying rows to _drop_. | ||
|
|
||
| `filter_out()` simplifies cases where you would have previously used a `filter()` to drop rows. It is particularly useful when missing values are involved. For example, to drop rows where the `count` is zero: | ||
|
|
||
| ```r | ||
| df |> filter(count != 0 | is.na(count)) | ||
|
|
||
| df |> filter_out(count == 0) | ||
| ``` | ||
|
|
||
| With `filter()`, you must provide a "negative" condition of `!= 0` and must explicitly guard against accidentally dropping rows with `NA`. With `filter_out()`, you directly specify rows to drop and you don't have to guard against dropping rows with `NA`, which tends to result in much clearer code. | ||
|
|
||
| This work is a result of [Tidyup 8: Expanding the `filter()` family](https://github.com/tidyverse/tidyups/pull/30), with a lot of great feedback from the community (#6560, #6891). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focus here
| #' @rdname filter | ||
| #' @export | ||
| filter_out <- function(.data, ..., .by = NULL, .preserve = FALSE) { | ||
| check_by_typo(...) | ||
| check_not_both_by_and_preserve({{ .by }}, .preserve) | ||
| UseMethod("filter_out") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual implementation is quite simple
- Extract out a common
filter_impl() - Provide
.verb = "filter"or.verb = "filter_out"- If
filter_out, provideinvert = TRUEto the C implementation of filter, which inverts the final result on the way out
- If
| @@ -1,17 +1,108 @@ | |||
| #' Keep rows that match a condition | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a full doc overhaul. Things worth spending time on:
- The Description section
- Note that I've declared
filter_out()as experimental
- Note that I've declared
@section Missing values:- Examples
| if (LOGICAL_ELT(invert, 0)) { | ||
| for (R_xlen_t i = 0; i < n; ++i) { | ||
| p_keep[i] = !p_keep[i]; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I carefully looked at all filter() tests and added a corresponding filter_out() test if it felt like the test wasn't too niche and generally tested some kind of invariant (0 row behavior, no input behavior, etc)
| `across()` doesn't work with `select()` or `rename()` because they already use tidy select syntax; if you want to transform column names with a function, you can use `rename_with()`. | ||
| ### filter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaked the vignettes only where it felt like using filter_out() was a noticeable improvement
Closes #6560
Closes #6891
Part of tidyverse/tidyups#30