Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize 'Selecting variables' section #61

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 31 additions & 25 deletions dots-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,23 @@ If you want your vector to have names, the problem is harder, and there's relati

## Selecting variables

A number of funtions in the tidyverse use `...` for selecting variables. For example, `tidyr::fill()` lets you fill in missing values based on the previous row:
A number of funtions in the tidyverse, like dplyr's scoped verbs and `ggplot2::facet_grid()`, require the user to explicitly quote the input. I now believe that this is a suboptimal interface because it is more typing (`var()` is longer than `c()`, and you must quote even single variables), and arguments that require their inputs to be explicitly quoted are rare in the tidyverse.

```{r, eval = FALSE}
# existing interface
dplyr::mutate_at(mtcars, vars(cyl:vs), mean)
# what I would create today
dplyr::mutate_at(mtcars, c(cyl:vs), mean)

# existing interface
ggplot2::facet_grid(rows = vars(drv), cols = vars(vs, am))
# what I would create today
ggplot2::facet_grid(rows = drv, cols = c(vs, am))
```

The `vars()` function is a simple wrapper around `quos()`, and adds no value compared to using `c()` directly, which requires quoting by the called function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say it adds no value. It can be useful to take selections in standard eval functions.


Many other functions use `...` for selecting variables. For example, `tidyr::fill()` lets you fill in missing values based on the previous row:

```{r}
df <- tribble(
Expand All @@ -120,7 +136,18 @@ df <- tribble(
df %>% fill(year, month)
```

All functions that work like this include a call to `tidyselect::vars_select()` that looks something like this:
This interface is inconsistent with e.g. `mutate_at()` and suffers from the same problem as `sum()`: we're using `...` to only save a little typing.
In other words, I believe that better interface to `fill()` would be:

```{r, eval = FALSE}
df %>% fill(c(year, month))
```

That said, it is unlikely we will ever change functions, because the benefit is smaller (primarily improved consistency) and the costs are high, as it impossible to switch from an evaluated argument to a quoted argument without breaking backward compatibility in some small percentage of cases.

### Implementation

All functions that work like `fill()` include a call to `tidyselect::vars_select()` that looks something like this:

```{r}
find_vars <- function(data, ...) {
Expand All @@ -130,7 +157,8 @@ find_vars <- function(data, ...) {
find_vars(df, year, month)
```

I now think that this interface is a mistake because it suffers from the same problem as `sum()`: we're using `...` to only save a little typing. We can eliminate the use of dots by requiring the user to use `c()`. (This change also requires explicit quoting and unquoting of `vars` since we're no longer using `...`.)

We can eliminate the use of dots by requiring the caller to use `c()`. (This change also requires explicit quoting and unquoting of `vars` since we're no longer using `...`.)

```{r}
foo <- function(data, vars) {
Expand All @@ -139,25 +167,3 @@ foo <- function(data, vars) {

find_vars(df, c(year, month))
```

In other words, I believe that better interface to `fill()` would be:

```{r, eval = FALSE}
df %>% fill(c(year, month))
```

Other tidyverse functions like dplyr's scoped verbs and `ggplot2::facet_grid()` require the user to explicitly quote the input. I now believe that this is also a suboptimal interface because it is more typing (`var()` is longer than `c()`, and you must quote even single variables), and arguments that require their inputs to be explicitly quoted are rare in the tidyverse.

```{r, eval = FALSE}
# existing interface
dplyr::mutate_at(mtcars, vars(cyl:vs), mean)
# what I would create today
dplyr::mutate_at(mtcars, c(cyl:vs), mean)

# existing interface
ggplot2::facet_grid(rows = vars(drv), cols = vars(vs, am))
# what I would create today
ggplot2::facet_grid(rows = drv, cols = c(vs, am))
```

That said, it is unlikely we will ever change functions, because the benefit is smaller (primarily improved consistency) and the costs are high, as it impossible to switch from an evaluated argument to a quoted argument without breaking backward compatibility in some small percentage of cases.