Add $ and [[ methods for tbl_df

tidyverse · Nov 3, 2015 · d5c1a63 · d5c1a63
1 parent 4ea9475
commit d5c1a63
Show file tree

Hide file tree

Showing 5 changed files with 35 additions and 4 deletions.
diff --git a/NAMESPACE b/NAMESPACE
@@ -1,7 +1,9 @@
 # Generated by roxygen2: do not edit by hand
 
+S3method("$",tbl_df)
 S3method("[",grouped_df)
 S3method("[",tbl_df)
+S3method("[[",tbl_df)
 S3method(all.equal,tbl_df)
 S3method(all.equal,tbl_dt)
 S3method(anti_join,data.frame)

diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,8 @@
 # dplyr 0.4.3.9000
 
+* `tbl_df`s gain `$` and `[[` methods that are ~5x faster than the defaults,
+  and never do partial matching (#1504).
+
 * `all_equal()` allows to compare data frames ignoring row and column order,
   and optionally ignoring minor differences in type (e.g. int vs. double) 
   (#821).

diff --git a/R/tbl-df.r b/R/tbl-df.r
@@ -7,14 +7,17 @@
 #'
 #' @section Methods:
 #'
-#' \code{tbl_df} implements two important base methods:
+#' \code{tbl_df} implements four important base methods:
 #'
 #' \describe{
 #' \item{print}{Only prints the first 10 rows, and the columns that fit on
 #'   screen}
 #' \item{\code{[}}{Never simplifies (drops), so always returns data.frame}
+#' \item{\code{[[}, \code{$}}{Calls \code{\link{.subset2}} directly,
+#'   so is considerably faster.}
 #' }
 #'
+#'
 #' @export
 #' @param data a data frame
 #' @examples
@@ -119,7 +122,17 @@ print.tbl_df <- function(x, ..., n = NULL, width = NULL) {
 }
 
 #' @export
-`[.tbl_df` <- function (x, i, j, drop = FALSE) {
+`[[.tbl_df` <- function(x, i) {
+  .subset2(x, i)
+}
+
+#' @export
+`$.tbl_df` <- function(x, i) {
+  .subset2(x, i)
+}
+
+#' @export
+`[.tbl_df` <- function(x, i, j, drop = FALSE) {
   if (missing(i) && missing(j)) return(x)
   if (drop) warning("drop ignored", call. = FALSE)
 

diff --git a/man/tbl_df.Rd b/man/tbl_df.Rd
diff --git a/vignettes/data_frames.Rmd b/vignettes/data_frames.Rmd
@@ -103,7 +103,7 @@ The speed of `as.data.frame()` is not usually a bottleneck when used interactive
 
 ## tbl_dfs vs data.frames
 
-There are two key differences between tbl_dfs and data.frames:
+There are three key differences between tbl_dfs and data.frames:
 
 *   When you print a tbl_df, it only shows the first ten rows and all the
     columns that fit on one screen. It also prints an abbreviated description
@@ -144,6 +144,17 @@ There are two key differences between tbl_dfs and data.frames:
     class(df2$x)
     ```
 
+*   When you extract a variable with `$`, tbl\_dfs never do partial 
+    matching:
+    
+    ```{r}
+    df <- data.frame(abc = 1)
+    df$a
+    
+    df2 <- data_frame(abc = 1)
+    df2$a
+    ```
+
 ## Memory
 
 One of the reasons that dplyr is fast is that it is very careful about when it makes copies. This section describes how this works, and gives you some useful tools for understanding the memory usage of data frames in R.