Skip to content

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Nov 25, 2025

Closes #6560
Closes #6891

Part of tidyverse/tidyups#30

BECAUSE THIS IS HARD Y'ALL
Comment on lines +3 to +19
* New experimental `filter_out()` companion to `filter()`.

* Use `filter()` when specifying rows to _keep_.

* Use `filter_out()` when specifying rows to _drop_.

`filter_out()` simplifies cases where you would have previously used a `filter()` to drop rows. It is particularly useful when missing values are involved. For example, to drop rows where the `count` is zero:

```r
df |> filter(count != 0 | is.na(count))

df |> filter_out(count == 0)
```

With `filter()`, you must provide a "negative" condition of `!= 0` and must explicitly guard against accidentally dropping rows with `NA`. With `filter_out()`, you directly specify rows to drop and you don't have to guard against dropping rows with `NA`, which tends to result in much clearer code.

This work is a result of [Tidyup 8: Expanding the `filter()` family](https://github.com/tidyverse/tidyups/pull/30), with a lot of great feedback from the community (#6560, #6891).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focus here

Comment on lines +217 to +223
#' @rdname filter
#' @export
filter_out <- function(.data, ..., .by = NULL, .preserve = FALSE) {
check_by_typo(...)
check_not_both_by_and_preserve({{ .by }}, .preserve)
UseMethod("filter_out")
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual implementation is quite simple

  • Extract out a common filter_impl()
  • Provide .verb = "filter" or .verb = "filter_out"
    • If filter_out, provide invert = TRUE to the C implementation of filter, which inverts the final result on the way out

@@ -1,17 +1,108 @@
#' Keep rows that match a condition
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a full doc overhaul. Things worth spending time on:

  • The Description section
    • Note that I've declared filter_out() as experimental
  • @section Missing values:
  • Examples

Comment on lines +185 to +189
if (LOGICAL_ELT(invert, 0)) {
for (R_xlen_t i = 0; i < n; ++i) {
p_keep[i] = !p_keep[i];
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I carefully looked at all filter() tests and added a corresponding filter_out() test if it felt like the test wasn't too niche and generally tested some kind of invariant (0 row behavior, no input behavior, etc)

`across()` doesn't work with `select()` or `rename()` because they already use tidy select syntax; if you want to transform column names with a function, you can use `rename_with()`.
### filter()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked the vignettes only where it felt like using filter_out() was a noticeable improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The case for filter(.missing = NULL, .how = c("keep", "drop")) filter(.missing = ) option to optionally retain missing values

2 participants