Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedicated function for selecting from current data #6204

Closed
mikkmart opened this issue Mar 3, 2022 · 6 comments · Fixed by #6492
Closed

Dedicated function for selecting from current data #6204

mikkmart opened this issue Mar 3, 2022 · 6 comments · Fixed by #6492
Assignees
Labels
each-col ↔️ feature a feature request or enhancement

Comments

@mikkmart
Copy link

mikkmart commented Mar 3, 2022

Would it be useful to have a dedicated function (say, pick()) to select columns from the current data? Currently, across() with only a .cols argument serves this role.

I would see a dedicated function having at least three advantages:

  1. Nicer syntax for union selections: pick(1, last_col()) vs. across(c(1, last_col())).
  2. Better semantics. across() makes sense when there’s functions to apply, but less so when it’s used just for selecting columns. pick() seems intuitive for only selecting columns.
  3. Reuse existing patterns: across(c(1:2, 4), mean) vs. map_df(pick(1:2, 4), mean). The first requires you to know that across() can select columns and apply a function, latter can re-use existing function application methods.

The last point is particularly important if/when ... is deprecated in across() (#6073), as funtionality would not be identical anymore. For example:

# With no ..., need to use an anonymous function for na.rm
across(c(1, 3:4), ~ mean(., na.rm = TRUE))

# Could be avoided with `pick()`
map_df(pick(1, 3:4), mean, na.rm = TRUE)

I would see the primary uses for this as:

  1. Replace across() in e.g. group_by() selections group_by(across(c(1, 3:5))) vs. group_by(pick(1, 3:5)). Big semantic and syntactic win, IMO.
  2. Passing arguments to functions that take data frame or matrix arguments. For example common questions about taking means or sums over rows in data frames. In my experience people don’t think to apply(across(1:5), 1, f), but apply(pick(1:5), 1, f) might be more intuitive.

I could think of two ways to implement this as a wrapper:

pick <- function(...) {
  across(.cols = c(...))
}

Or:

pick <- function(...) {
  select(cur_data(), ...)
}

Although, particularly with the across() route, it would seem nicer to reverse the dependency and extract the relevant parts from across() intopick() instead.

I appreciate your consideration for this feature request.

@romainfrancois romainfrancois added the feature a feature request or enhancement label Mar 3, 2022
@DavisVaughan
Copy link
Member

We've also discussed letting cur_data() have a cols argument, or ..., for this purpose.

And possibly a groups = TRUE/FALSE argument to replace cur_data_all()

@hadley
Copy link
Member

hadley commented Aug 1, 2022

This function feels conceptually more like across() than cur_*(), so I like pick() as a name. Maybe it could include a .group_vars argument? i.e. something like:

pick <- function(..., .group_vars = TRUE) {
  data <- if (isTRUE(.group_vars)) cur_data_all() else cur_data()
  select(data, ...)
}

I'm not sure if it's worth attempting to optimise this further.

@hadley hadley changed the title Dedicated function for selecting from current data [FR] Dedicated function for selecting from current data Aug 1, 2022
@DavisVaughan DavisVaughan self-assigned this Aug 22, 2022
@DavisVaughan
Copy link
Member

We should probably work towards deprecating cur_data() and cur_data_all() in favor of pick() too

@hadley
Copy link
Member

hadley commented Sep 14, 2022

When we implement this I assume we'll also change:

across(.cols = everything(), .fns = NULL, ..., .names = NULL, .unpack = FALSE)
if_any(.cols = everything(), .fns = NULL, ..., .names = NULL)
if_all(.cols = everything(), .fns = NULL, ..., .names = NULL)

To

across(.cols, .fns, ..., .names = NULL, .unpack = FALSE)
if_any(.cols, .fns, ..., .names = NULL)
if_all(.cols, .fns, ..., .names = NULL)

Possible with deprecation, something like:

if (missing(.cols)) {
  lifecycle::deprecate_warn("1.1.0", I("across() without `.cols`"), I("`everything()` to select all columns"))
}

@DavisVaughan
Copy link
Member

Yea and .cols would continue to work, like:

if (missing(.cols)) {
  lifecycle::deprecate_warn("1.1.0", I("across() without `.cols`"), I("`everything()` to select all columns"))
  .cols <- quote(everything())
}

@hadley
Copy link
Member

hadley commented Sep 22, 2022

Also contrary to my earlier example, group_vars absolutely needs to default to FALSE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
each-col ↔️ feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants