dplyr and tibble collision possible bug #1936

ghost · 2016-06-17T23:50:04Z

Using R 3.3.0, dplyr 0.4.3, tibble 1.0

x <- dplyr::data_frame(foo = "bar")
x$qux

This returns NULL, as expected because the column 'qux' doesn't exits.

Now run this:

y <- tibble::data_frame(fizz = "buzz")
x$qux

This tells me "Error: Unknown column 'qux'" ...which doesn't make any sense because I haven't changed or touched x.

Note that this bug was discovered during our migration from 3.2 to 3.3

The text was updated successfully, but these errors were encountered:

hadley · 2016-06-18T01:29:13Z

That will be the default as soon as dplyr 0.5 is out next week

kevinushey · 2016-06-18T05:20:27Z

This seems like a dangerous change. Won't it break any packages that do something like is.null(df$foo) to test for the existence of foo in df?

ghost · 2016-06-18T05:32:40Z

yeah this is a huge change. This is how it should have worked from day 1, but it's going to be hard when data.frame still returns NULL

krlmlr · 2016-06-18T13:03:23Z

I'm seeing:

> x <- dplyr::data_frame(foo = "bar")
> x$qux
Error in stopc("Unknown column '", i, "'") : Unknown column 'qux'
5: traceback(1)
4: stop(..., domain = NA) at utils.r#53
3: stopc("Unknown column '", i, "'") at tbl-df.r#37
2: `$.tbl_df`(x, qux)
1: x$qux

We should do tidyverse/tibble#102 as an alternative.

hadley · 2016-06-18T15:21:32Z

We'll see. Having [ always return a data frame caused some problems in the early days too. But I'm confident that in the long run it will fix more problems than it creates.

ghost · 2016-06-19T22:30:02Z

has_column() should be included when this change goes live, that's a really good idea

But also, don't you think that this change will cause a lot of confusion considering that base::data.frame returns NULL when a column isn't found? I agree that this should never have been how this works (should have always been an error), but isn't going going to break a lot of legacy code?

kevinushey · 2016-06-19T22:53:52Z

Perhaps tbls should stop inheriting from data.frame since they don't preserve the semantics of the underlying interface. (he said, snarkily)

To be honest, though, I don't like this change at all -- I think it's well understood that $ returns NULL when the element does not exist. This is how $ generally works across all R types -- why start changing those semantics now?

krlmlr · 2016-06-20T06:55:54Z

I got hurt badly once when trying to replace all data frames by data tables in existing code. Still, data tables are data frames.

I think we should be fine if dplyr makes sure that all operations return data frames from df input, and tibbles from tibble input. Switching to tibble will then be a choice made consciously, with all its alleged pros and cons.

Are there currently operations that return a tibble when passed a df?

hadley · 2016-06-20T11:46:21Z

Hmmmmm, maybe I didn't think this fully through - I didn't think about the use case of is.null(df$x). Personally, I never use this pattern (as I think it's relying on a subtle bug in R), but I agree that it is common in practice. This will become more problematic since more packages are importing tibble, not just suggesting it.

OTOH, there are lots of subtle bugs that people are probably currently missing:

df <- data.frame(xyz =1:3)
is.null(df$x)
# [1] FALSE

And the current behaviour of $ is particularly annoying when used interactively as recycling will mean that the error you get is typically not the root cause.

hadley · 2016-06-20T11:50:03Z

In my mind, this change might potentially require some changes to user code, but it forces an early error in many cases where you currently get a late error.

Somewhat unusually (compared to the rest of tibble), this change is most important when working interactively, and less important when programming.

hadley · 2016-06-20T11:56:19Z

I had thought that this was only in the dev version of tibble, but it is in the released version, which has been available for ~2 months. This is the first complaint I've heard, but that might be because most people are exposed to tibbles through dplyr.

Maybe we should not throw an error for [[ as that's a more "programmatic" interface, and might be more used by packages. $ would continue to protect users from inadvertent interactive errors.

hadley · 2016-06-20T11:57:56Z

Then we'd have:

[: use when you want a smaller tibble
[[: use when you want a single column (returns null)
$: use when you want a single column (throws error)

That seems like a reasonable set of principles, and a reasonable compromise with the current behaviour.

krlmlr · 2016-06-20T11:59:49Z

Not the other way round -- [[ errs and $ gives NULL?

krlmlr · 2016-06-20T12:01:28Z

Ah, I see. The original complaint was about that $ throws an error, though.

krlmlr · 2016-06-20T12:04:42Z

I'd argue that if an operator is used in packages, we want to have an error, just like the current behavior of [ and [[. Errors in packages will be detected with R CMD check, too. The situation may be different with analysis scripts.

hadley · 2016-06-20T12:34:16Z

I think we do want to have an error but we need some easy way that people can preserve existing behaviour - relying on a function in tibble isn't sufficient, because the package might not use tibble. We could encourage people to use "x" %in% names(df) but switching from is.null(df$x) to is.null(df[["x"]]) is easier (and for regular data frames also protects against partial matching).

I've been thinking about framing tibbles as lazy, curmudgeonly data frames, i.e. they only do the minimal amount of work (they never coerce) and they complain whenever something isn't as they expect.

cboettig · 2016-06-21T01:52:56Z

I agree with @kevinushey that this change is surprising given the usual behavior of $ and the expectation that tibbles inherit data.frame behavior. I've had some code break in a subtle way due to the last tidyrrelease causing this behavior to be adopted in output data.frames, which is particularly confusing since dplyr still has the old behavior, making for conflicting behavior of tbl_df (though glad this week's release of dplyr should fix that).

Still, an opt-out option could at least ease the transition.

hadley · 2016-06-27T19:03:08Z

Closing here since implementation needs to happen in tibble.

hadley closed this as completed Jun 18, 2016

hadley reopened this Jun 20, 2016

krlmlr mentioned this issue Jun 20, 2016

Why did tibble decide to throw error instead of return NULL when an undefined column is requested? tidyverse/tibble#91

Closed

hadley closed this as completed Jun 27, 2016

hadley mentioned this issue Jun 27, 2016

Back out $ throwing error tidyverse/tibble#109

Closed

pitakakariki mentioned this issue Jan 18, 2017

lmList gives warnings with tibbles lme4/lme4#406

Closed

lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplyr and tibble collision possible bug #1936

dplyr and tibble collision possible bug #1936

ghost commented Jun 17, 2016 •

edited by ghost

Loading

hadley commented Jun 18, 2016

kevinushey commented Jun 18, 2016 •

edited

Loading

ghost commented Jun 18, 2016

krlmlr commented Jun 18, 2016 •

edited

Loading

hadley commented Jun 18, 2016

ghost commented Jun 19, 2016 •

edited by ghost

Loading

kevinushey commented Jun 19, 2016

krlmlr commented Jun 20, 2016

hadley commented Jun 20, 2016

hadley commented Jun 20, 2016

hadley commented Jun 20, 2016

hadley commented Jun 20, 2016

krlmlr commented Jun 20, 2016

krlmlr commented Jun 20, 2016

krlmlr commented Jun 20, 2016

hadley commented Jun 20, 2016 •

edited

Loading

cboettig commented Jun 21, 2016

hadley commented Jun 27, 2016

dplyr and tibble collision possible bug #1936

dplyr and tibble collision possible bug #1936

Comments

ghost commented Jun 17, 2016 • edited by ghost Loading

hadley commented Jun 18, 2016

kevinushey commented Jun 18, 2016 • edited Loading

ghost commented Jun 18, 2016

krlmlr commented Jun 18, 2016 • edited Loading

hadley commented Jun 18, 2016

ghost commented Jun 19, 2016 • edited by ghost Loading

kevinushey commented Jun 19, 2016

krlmlr commented Jun 20, 2016

hadley commented Jun 20, 2016

hadley commented Jun 20, 2016

hadley commented Jun 20, 2016

hadley commented Jun 20, 2016

krlmlr commented Jun 20, 2016

krlmlr commented Jun 20, 2016

krlmlr commented Jun 20, 2016

hadley commented Jun 20, 2016 • edited Loading

cboettig commented Jun 21, 2016

hadley commented Jun 27, 2016

ghost commented Jun 17, 2016 •

edited by ghost

Loading

kevinushey commented Jun 18, 2016 •

edited

Loading

krlmlr commented Jun 18, 2016 •

edited

Loading

ghost commented Jun 19, 2016 •

edited by ghost

Loading

hadley commented Jun 20, 2016 •

edited

Loading