Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr::rename with fixed new-names and old-names given by length-1 named vectors gives wrong names #5207

Closed
pnacht opened this issue May 7, 2020 · 6 comments
Labels
bug an unexpected problem or unintended behavior verbs 🏃‍♀️

Comments

@pnacht
Copy link

pnacht commented May 7, 2020

A one-line reprex of this behavior is:

dplyr::rename(data.frame(old = 1), new = c(dud = "old"))
#>   new...dud
#> 1         1

This is an issue if we have a function where data.frames with myriad names have their names changed to certain "standard" names. In the case below, I have a standard column-name foo, but a data.frame's called it bar instead, so we call the following function to give it the standard name

standardize <- function(df, fooName) {
  dplyr::rename(df, foo = !!fooName)
}

standardize(data.frame(bar = 1), "bar")
#>   foo
#> 1   1

Created on 2020-05-07 by the reprex package (v0.3.0)

This works just fine.

However, if, for whatever reason, the value passed to fooName is a length-1 named vector, we get the behavior shown at the start:

name <- c(buzz = "bar")
standardize(data.frame(bar = 1), name)
#>   foo...buzz
#> 1          1

The vector's name is clumsily suffixed to the desired name with an ellipsis. Is this expected behavior? I'm currently solving this by using foo = unname(fooName), which works, but wonder if this is just a bug.

@romainfrancois
Copy link
Member

With the pr #5224, you'd get:

library(dplyr, warn.conflicts = FALSE)

standardize <- function(df, fooName) {
  dplyr::rename(df, foo = !!fooName)
}

standardize(data.frame(bar = 1), "bar")
#>   foo
#> 1   1
name <- c(buzz = "bar")
standardize(data.frame(bar = 1), name)
#>   buzz
#> 1    1

Created on 2020-05-11 by the reprex package (v0.3.0)

but I'm not sure this is what's expected

@pnacht
Copy link
Author

pnacht commented May 11, 2020

It is not. If the desired name (foo) is explicitly given, it seems most reasonable to me that any names defined in the character vector should be ignored.

For the record, I only defined standardize using the bang-bang operator (!!fooName) in the original example to hide the warning otherwise shown (regarding ambiguity of using variable names).

@lionel-
Copy link
Member

lionel- commented May 11, 2020

I think by default it should be an error to supply both outer and inner names.

The problem with letting the outer name override the names of the character vector is that it isn't generalisable when the vector is length > 1. So the proper syntax would be rename(!!c(foo = name)).

@hadley hadley added bug an unexpected problem or unintended behavior verbs 🏃‍♀️ labels May 11, 2020
@mkuhring
Copy link

I ran into similar issues when using named vectors with mutate_at and across. I expect it to be related, but I can open a new issue if preferred.

Both methods make use (or want to) of the names of vector, while I would expect them to just use the actual values. mutate_at creates new columns based on the names (though I could swear I also had cases where it renamed the original ones, but can't reproduce). across fails because it seems to want to use the names only to work with the columns.

Here is an example to demonstrate (using dplyr version 1.0.0):

# Example data frame and function
my_data <- data.frame(A = 1:3, B = c("a", "b", "c"), C = 7:9)
my_fun <- function(x){x+1}

# Selecting columns with a normal vector
my_cols <- c("A", "C")
dplyr::mutate_at(my_data, .vars = dplyr::all_of(my_cols), .funs = my_fun)
#>   A B  C
#> 1 2 a  8
#> 2 3 b  9
#> 3 4 c 10
dplyr::mutate(my_data, dplyr::across(.cols = dplyr::all_of(my_cols), .fns = my_fun))
#>   A B  C
#> 1 2 a  8
#> 2 3 b  9
#> 3 4 c 10

# Selecting columns with a named vector
my_cols <- c(NameA = "A", NameC = "C")
dplyr::mutate_at(my_data, .vars = dplyr::all_of(my_cols), .funs = my_fun)
#>   A B C NameA NameC
#> 1 1 a 7     2     8
#> 2 2 b 8     3     9
#> 3 3 c 9     4    10
dplyr::mutate(my_data, dplyr::across(.cols = dplyr::all_of(my_cols), .fns = my_fun))
#> Error: Problem with `mutate()` input `..1`.
#> x object 'NameA' not found
#> i Input `..1` is `dplyr::across(.cols = dplyr::all_of(my_cols), .fns = my_fun)`.

Created on 2021-03-23 by the reprex package (v0.3.0)

@romainfrancois
Copy link
Member

#5889 fixes the across() issue.

@romainfrancois
Copy link
Member

Closing this now, as the across() issue has been dealt with, and this feels ok:

library(dplyr, warn.conflicts = FALSE)

dplyr::rename(data.frame(old = 1), new = c(dud = "old"))
#>   new...dud
#> 1         1

Created on 2021-05-19 by the reprex package (v2.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verbs 🏃‍♀️
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants