Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how to extend tibbles #275

Closed
hadley opened this issue Jul 5, 2017 · 20 comments · Fixed by #1512
Closed

Clarify how to extend tibbles #275

hadley opened this issue Jul 5, 2017 · 20 comments · Fixed by #1512
Milestone

Comments

@hadley
Copy link
Member

hadley commented Jul 5, 2017

By following the principles defined at http://adv-r.hadley.nz/s3.html#inheritance

  • Provide a constructor that can also be used to create subclassed objects

    new_tibble <- function(data, ..., subclass = NULL) {
      stopifnot(is.data.frame(data))
      
      structure(
        data,
        ...,
        class = c(subclass, "tibble", "tbl_df", "data.frame")
      )
    }
  • Provide a reconstruct method

     reconstruct.tibble <- function(new, old) {
       new_tibble(new)
     }
  • Call S3::reconstruct() in all methods that return a tibble

This supersedes #155, #211, #218.

@hadley
Copy link
Member Author

hadley commented Jul 5, 2017

Also connected to making print methods more extensible.

@hadley
Copy link
Member Author

hadley commented Oct 4, 2017

We should verify that these changes are helpful by looking into how it affects googledrive.

@DavisVaughan
Copy link
Member

For what it's worth, I rewrote most of tibbletime to use sloop::reconstruct(). It seems to work for most cases, but I also needed a maybe_reconstruct() function that just returns a tibble and not a tbl_time object (the class I'm building on top of tibble) if certain conditions aren't met after the dplyr function is run.

For example, I rely on having a date/datetime index column in the tibble. If that column is lost, there is no reason to be a tbl_time object. So when calling transmute(), if the resulting tibble doesn't have that index column, then I shouldn't call reconstruct() but should instead just return that tibble.

Also, I'm sure you're aware but we would need a way to extend grouped_df too. Both Thomas in tidygraph and I use grouped_tbl_graph and grouped_tbl_time style objects so a formal way to extend them would be neat.

@krlmlr
Copy link
Member

krlmlr commented Nov 7, 2017

Thanks. I wonder if you could implement your reconstruct() method accordingly, so that your class is added only if conditions are met? Can you point me to your implementation?

Verbs that operate on grouped data also should call reconstruct() at the end.

I'll postpone writing this vignette until after CRAN release of tibble + pillar, because otherwise we'll need to wait for sloop as well.

@DavisVaughan
Copy link
Member

I think you're right that I could put the checks inside reconstruct() rather than include a separate function. They are cheap and quick checks so it wouldn't be a problem. You can find the reconstruct() function here. The appropriate new_*() functions are here.


My main question on grouped data is a bit convoluted, but bare with me. No rush on needing an answer.

Assume I have created a tbl_time object from a tbl_df object. Inheritance:
c("tbl_time", "tbl_df", "tbl", "data.frame")

Now, I want to additionally group the data by a column. I assume the inheritance should look like:
c("grouped_tbl_time", "tbl_time", "grouped_df", "tbl_df", "tbl", "data.frame")

The way I do this, attempting to follow the new Advanced R material, is as follows:

  1. Call group_by.tbl_time(). For this to work, I return the data to just a tbl_df() and call the group_by() for that. Then I construct my grouped_tbl_time on top of the result. I don't think reconstruct() is appropriate here, because the original .data is not a grouped_df
group_by.tbl_time <- function(.data, ..., add = FALSE) {
  # Normal group then pass to grouped_tbl_time helper
  quos <- rlang::quos(...)
  .data_grouped <- dplyr::group_by(as_tibble(.data), !!! quos, add = add)
  grouped_tbl_time(.data_grouped, !! get_index_quo(.data))
}
  1. The above code calls the grouped_tbl_time() helper, which finds the pieces necessary to create the tbl_time object, and calls new_grouped_tbl_time(args).

  2. new_grouped_tbl_time() then calls new_tbl_time(args, ..., subclass = "grouped_tbl_time")

  3. This is where I get confused. At this stage, new_tbl_time() has to decide between calling new_grouped_df() or new_tibble(). The way I do this is to check if we are working with a grouped_df() at the moment or not. (I have my own versions of new_grouped_df() and new_tibble() at the bottom of the page here.)

Is that right? I don't see any other way to get the inheritance correct unless maybe I have new_grouped_tbl_time() instead call new_tbl_time(args, ..., subclass = c("grouped_tbl_time", "grouped_df"). The inheritance would then be:
c("grouped_tbl_time", "grouped_df", "tbl_time", "tbl_df", "tbl", "data.frame")
but maybe that makes more sense?

@krlmlr
Copy link
Member

krlmlr commented Nov 12, 2017

I wonder if you can get away with simply removing the group_by.tbl_time() method and implementing reconstruct.tbl_time() to add the "grouped_tbl_time" class if the result has the "grouped_df" class.

#' @export
reconstruct.tbl_time <- function(new, old) {
  if ("grouped_df" %in% class(new)) class(new) <- c("grouped_tbl_time", class(new))
  ...
  new
}

On a side note, it seems that you can simply pass along ... to dplyr::group_by() in your (maybe unneeded) implementation of group_by.tbl_time().

@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 12, 2017

I think your suggestion removes a lot of unnecessary work on my end. I really appreciate you taking a look at it.

I saw your latest PR, removing ... in new_tibble(). Aren't the ellipses exactly how someone like me might extend a tibble's attributes? The essence of new_tbl_time() is (before your new update):

new_tbl_time <- function(x, index_quo, index_time_zone, ..., subclass = NULL) {
    tibble::new_tibble(
      x,
      index_quo       = index_quo,
      index_time_zone = index_time_zone,
      subclass        = c(subclass, "tbl_time")
    )
}

Where index_quo and index_time_zone are passed through the ... to become attributes.

I guess the alternative is for me to assign the attributes to x in new_tbl_time(), then pass x to new_tibble()?


Support from Advanced R (not trying to take this as gospel by bringing it up so often, its just what ive been going off and it seems to make sense to me).

"To allow subclasses, the parent constructor needs to have ... and subclass arguments" -here

It provides the below example, where the new z attribute is added through the ....

new_my_class <- function(x, y, ..., subclass = NULL) {
  stopifnot(is.numeric(x))
  stopifnot(is.logical(y))
  
  structure(
    x,
    y = y,
    ...,
    class = c(subclass, "my_class")
  )
}

new_subclass <- function(x, y, z) {
  stopifnot(is.character(z))
  new_my_class(x, y, z, subclass = "subclass")
}

@krlmlr
Copy link
Member

krlmlr commented Nov 12, 2017

Fine with adv-r as the reference ;-) . I looked in a previous subsection, https://adv-r.hadley.nz/s3.html#constructors, and I haven't found the ellipsis there. It's easy to add back, though.

@DavisVaughan
Copy link
Member

I'm actually surprised he doesn't do it there. In theory new_Date() should (I think?) be extensible so people could build on top of it. new_Date() and new_s3_dbl() are both in sloop, and it would be easy to do:

new_Date <- function(x, ..., subclass = NULL) {
  sloop::new_s3_dbl(x, ..., class = c(subclass, "Date"))
}

rather than just:

new_Date <- function(x) {
  sloop::new_s3_dbl(x, class = "Date")
}

My guess is that he is just introducing the topic for the first time there in 13.2.1 of Advanced R, and it might be overwhelming to show everything (inheritance) all at once. Which is why the ... and subclass are introduced later in 13.5 Inheritance. Just a thought ¯\(ツ)

@krlmlr
Copy link
Member

krlmlr commented Nov 13, 2017

I guess a forward reference would help impatient and superficial readers like myself, but that's a tough balance.

@DavisVaughan
Copy link
Member

If we are being really picky, the tibble specific attributes go before the ... (see new_my_class() above). Just thought I'd point it out!

function (x, ..., nrow = NULL, subclass = NULL) 

# VS

function (x, nrow = NULL, ..., subclass = NULL) 

Agreed. Effective teaching is hard work.

@krlmlr
Copy link
Member

krlmlr commented Nov 13, 2017

This was deliberate, users shouldn't be calling new_tibble(x, 3) anyway, and it helps changing the interface later if necessary.

@jennybc
Copy link
Member

jennybc commented Nov 15, 2017

@krlmlr As someone who has subclassed tibble using DIY methods, what's the status here? Would it now be welcome for people to test out the more official ways of doing this? Or not quite yet?

@krlmlr
Copy link
Member

krlmlr commented Nov 15, 2017

We now have new_tibble() in the upcoming version (releasing today), but we still need sloop::reconstruct() for proper extensibility.

@mbojan
Copy link

mbojan commented Dec 4, 2017

I look forward to trying these extensions out. I have to say that current behavior of dropping attributes (and subclasses for that matter) seems inconsistent across R. For example subset drops attributes but maintains the class attribute (including subclasses, if any). dplyr verbs drop both. If I recall discussions around S3 and S4 systems correctly, the positions seemed to be that the default behavior should be: when a superclass method is applied to a subclass, the result should be still of subclass. That's how e.g. data frame indexing works. I think if we move away from this, it will be necessary to write a lot of vacuous code, like subclass-dedicated methods for all generics that already work with the superclass...

@krlmlr
Copy link
Member

krlmlr commented Jan 18, 2018

Also need a statement about (non-)support of S4 classes.

@krlmlr krlmlr added this to the 1.5.0 milestone Oct 5, 2018
@hadley
Copy link
Member Author

hadley commented Oct 6, 2018

I think this should wait until the next release so we can fully internalise the lessons from vctrs.

@krlmlr
Copy link
Member

krlmlr commented Mar 2, 2020

We should move the existing "Extending" vignette to {vctrs}, and adapt to the new infrastructure.

@krlmlr krlmlr added this to the 3.1.1 milestone Feb 23, 2021
@krlmlr
Copy link
Member

krlmlr commented Apr 18, 2021

Topics:

  • Tweak overall printing: link to pillar vignette
  • Tweak printing of vector column: link to vctrs vignette
  • Inherit from tbl or tbl_df?
  • Main part of the text: Follow ?dplyr_reconstruct
    • Example classes: sticky column, ...
    • Best practices for inheritance, what works out of the box, what needs adaptation?

@krlmlr krlmlr modified the milestones: 3.1.2, 3.1.3 Jun 24, 2021
@krlmlr krlmlr modified the milestones: 3.1.3, 3.1.4 Jul 17, 2021
@krlmlr krlmlr removed this from the 3.1.4 milestone Aug 1, 2021
@krlmlr
Copy link
Member

krlmlr commented Oct 25, 2021

Are we waiting for #890 here?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants