Skip to content

filter(.missing = ) option to optionally retain missing values #6560

@DavisVaughan

Description

@DavisVaughan

Currently, filter():

  • Retains TRUE
  • Drops FALSE and NA
  • This matches subset()

A number of requests have come up in the past desiring:

  • Retains TRUE and NA
  • Drops FALSE
  • This matches [

Here are a few:

This is most apparently annoying when you have multiple columns to filter by

library(dplyr)

df <- tibble(
  x = c(TRUE, FALSE, NA, NA, NA),
  y = c(NA, TRUE, NA, NA, NA),
  z = c(TRUE, TRUE, TRUE, FALSE, NA)
)
df
#> # A tibble: 5 × 3
#>   x     y     z    
#>   <lgl> <lgl> <lgl>
#> 1 TRUE  NA    TRUE 
#> 2 FALSE TRUE  TRUE 
#> 3 NA    NA    TRUE 
#> 4 NA    NA    FALSE
#> 5 NA    NA    NA

filter(df, x, y, z)
#> # A tibble: 0 × 3
#> # … with 3 variables: x <lgl>, y <lgl>, z <lgl>

filter(df, x | is.na(x), y | is.na(y), z | is.na(z))
#> # A tibble: 3 × 3
#>   x     y     z    
#>   <lgl> <lgl> <lgl>
#> 1 TRUE  NA    TRUE 
#> 2 NA    NA    TRUE 
#> 3 NA    NA    NA

I propose a .missing = c("drop", "keep", "error") argument to filter() that would allow you to optionally keep rows with NA.

We'd have to carefully analyze the boolean algebra here to make sure we are being consistent. In particular I think we want to make sure these are the same if we do this, but I think they are:

# these should be the same
filter(df, x, y, .missing = "drop")
filter(df, x & y, .missing = "drop")

# these should be the same
filter(df, x, y, .missing = "keep")
filter(df, x & y, .missing = "keep")

The "drop" case is probably already consistent because that is what we do today, and the "keep" case is probably like this, which seems consistent

na_to_true <- function(x) {
  x[is.na(x)] <- TRUE
  x
}

na_to_true(TRUE & NA)
#> [1] TRUE
na_to_true(TRUE) & na_to_true(NA)
#> [1] TRUE

When we do this, we should also think about whether vec_pall() or vec_pany() could be used in filter() in any way, since they are heavily optimized for performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementrows ↕️Operations on rows: filter(), slice(), arrange()

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions