-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dplyr and tibble collision possible bug #1936
Comments
That will be the default as soon as dplyr 0.5 is out next week |
This seems like a dangerous change. Won't it break any packages that do something like |
yeah this is a huge change. This is how it should have worked from day 1, but it's going to be hard when data.frame still returns NULL |
I'm seeing:
We should do tidyverse/tibble#102 as an alternative. |
We'll see. Having |
has_column() should be included when this change goes live, that's a really good idea But also, don't you think that this change will cause a lot of confusion considering that |
Perhaps To be honest, though, I don't like this change at all -- I think it's well understood that |
I got hurt badly once when trying to replace all data frames by data tables in existing code. Still, data tables are data frames. I think we should be fine if dplyr makes sure that all operations return data frames from df input, and tibbles from tibble input. Switching to tibble will then be a choice made consciously, with all its alleged pros and cons. Are there currently operations that return a tibble when passed a df? |
Hmmmmm, maybe I didn't think this fully through - I didn't think about the use case of OTOH, there are lots of subtle bugs that people are probably currently missing: df <- data.frame(xyz =1:3)
is.null(df$x)
# [1] FALSE And the current behaviour of |
In my mind, this change might potentially require some changes to user code, but it forces an early error in many cases where you currently get a late error. Somewhat unusually (compared to the rest of tibble), this change is most important when working interactively, and less important when programming. |
I had thought that this was only in the dev version of tibble, but it is in the released version, which has been available for ~2 months. This is the first complaint I've heard, but that might be because most people are exposed to tibbles through dplyr. Maybe we should not throw an error for |
Then we'd have:
That seems like a reasonable set of principles, and a reasonable compromise with the current behaviour. |
Not the other way round -- |
Ah, I see. The original complaint was about that $ throws an error, though. |
I'd argue that if an operator is used in packages, we want to have an error, just like the current behavior of |
I think we do want to have an error but we need some easy way that people can preserve existing behaviour - relying on a function in tibble isn't sufficient, because the package might not use tibble. We could encourage people to use I've been thinking about framing tibbles as lazy, curmudgeonly data frames, i.e. they only do the minimal amount of work (they never coerce) and they complain whenever something isn't as they expect. |
I agree with @kevinushey that this change is surprising given the usual behavior of Still, an opt-out option could at least ease the transition. |
Closing here since implementation needs to happen in tibble. |
Using R 3.3.0, dplyr 0.4.3, tibble 1.0
This returns NULL, as expected because the column 'qux' doesn't exits.
Now run this:
This tells me "Error: Unknown column 'qux'" ...which doesn't make any sense because I haven't changed or touched x.
Note that this bug was discovered during our migration from 3.2 to 3.3
The text was updated successfully, but these errors were encountered: