From b497da2435e5976a16a02a5758da4587c707e36c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kirill=20M=C3=BCller?= <kirill.mueller@ivt.baug.ethz.ch>
Date: Thu, 10 Mar 2016 20:08:57 +0100
Subject: [PATCH 1/4] copy part of dplyr's vignette

---
 DESCRIPTION               |   4 +-
 vignettes/data_frames.Rmd | 156 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 159 insertions(+), 1 deletion(-)
 create mode 100644 vignettes/data_frames.Rmd

diff --git a/DESCRIPTION b/DESCRIPTION
index 027627ebe..3392a6d98 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -15,8 +15,10 @@ URL: https://github.com/krlmlr/tibble
 BugReports: https://github.com/krlmlr/tibble/issues
 Depends: R (>= 3.1.2)
 Imports: methods, assertthat, utils, lazyeval (>= 0.1.10), Rcpp
-Suggests: testthat, knitr, Lahman (>= 3.0.1)
+Suggests: testthat, knitr, Lahman (>= 3.0.1),
+    rmarkdown
 LazyData: yes
 License: MIT + file LICENSE
 RoxygenNote: 5.0.1
 LinkingTo: Rcpp
+VignetteBuilder: knitr
diff --git a/vignettes/data_frames.Rmd b/vignettes/data_frames.Rmd
new file mode 100644
index 000000000..c57100d15
--- /dev/null
+++ b/vignettes/data_frames.Rmd
@@ -0,0 +1,156 @@
+---
+title: "Data frames"
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Data frames}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+```{r, echo = FALSE, message = FALSE}
+knitr::opts_chunk$set(collapse = T, comment = "#>")
+options(tibble.print_min = 4L, tibble.print_max = 4L)
+library(dplyr)
+```
+
+## Creating
+
+`data_frame()` is a nice way to create data frames. It encapsulates best practices for data frames:
+
+  * It never changes an input's type (i.e., no more `stringsAsFactors = FALSE`!).
+    
+    ```{r}
+    data.frame(x = letters) %>% sapply(class)
+    data_frame(x = letters) %>% sapply(class)
+    ```
+    
+    This makes it easier to use with list-columns:
+    
+    ```{r}
+    data_frame(x = 1:3, y = list(1:5, 1:10, 1:20))
+    ```
+    
+    List-columns are most commonly created by `do()`, but they can be useful to
+    create by hand.
+      
+  * It never adjusts the names of variables:
+  
+    ```{r}
+    data.frame(`crazy name` = 1) %>% names()
+    data_frame(`crazy name` = 1) %>% names()
+    ```
+
+  * It evaluates its arguments lazily and sequentially:
+  
+    ```{r}
+    data_frame(x = 1:5, y = x ^ 2)
+    ```
+
+  * It adds the `tbl_df()` class to the output so that if you accidentally print a large 
+    data frame you only get the first few rows.
+    
+    ```{r}
+    data_frame(x = 1:5) %>% class()
+    ```
+
+  * It changes the behaviour of `[` to always return the same type of object:
+    subsetting using `[` always returns a `tbl_df()` object; subsetting using 
+    `[[` always returns a column.
+    
+    You should be aware of one case where subsetting a `tbl_df()` object  
+    will produce a different result than a `data.frame()` object:
+  
+    ```{r}
+    df <- data.frame(a = 1:2, b = 1:2)
+    str(df[, "a"])
+    
+    tbldf <- tbl_df(df)
+    str(tbldf[, "a"])
+    ```
+    
+  * It never uses `row.names()`. The whole point of tidy data is to 
+    store variables in a consistent way. So it never stores a variable as 
+    special attribute.
+  
+  * It only recycles vectors of length 1. This is because recycling vectors of greater lengths 
+    is a frequent source of bugs.
+
+## Coercion
+
+To complement `data_frame()`, dplyr provides `as_data_frame()` to coerce lists into data frames. It does two things:
+
+* It checks that the input list is valid for a data frame, i.e. that each element
+  is named, is a 1d atomic vector or list, and all elements have the same 
+  length.
+  
+* It sets the class and attributes of the list to make it behave like a data frame.
+  This modification does not require a deep copy of the input list, so it's
+  very fast.
+  
+This is much simpler than `as.data.frame()`. It's hard to explain precisely what `as.data.frame()` does, but it's similar to `do.call(cbind, lapply(x, data.frame))` - i.e. it coerces each component to a data frame and then `cbinds()` them all together. Consequently `as_data_frame()` is much faster than `as.data.frame()`:
+
+```{r}
+l2 <- replicate(26, sample(100), simplify = FALSE)
+names(l2) <- letters
+microbenchmark::microbenchmark(
+  as_data_frame(l2),
+  as.data.frame(l2)
+)
+```
+
+The speed of `as.data.frame()` is not usually a bottleneck when used interactively, but can be a problem when combining thousands of messy inputs into one tidy data frame.
+
+## tbl_dfs vs data.frames
+
+There are three key differences between tbl_dfs and data.frames:
+
+*   When you print a tbl_df, it only shows the first ten rows and all the
+    columns that fit on one screen. It also prints an abbreviated description
+    of the column type:
+    
+    ```{r}
+    data_frame(x = 1:1000)
+    ```
+    
+    You can control the default appearance with options:
+    
+    * `options(tibble.print_max = n, tibble.print_min = m)`: if more than `n`
+      rows print `m` rows. Use `options(tibble.print_max = Inf)` to always
+      show all rows.
+    
+    * `options(tibble.width = Inf)` will always print all columns, regardless
+       of the width of the screen.
+
+    
+*   When you subset a tbl\_df with `[`, it always returns another tbl\_df. 
+    Contrast this with a data frame: sometimes `[` returns a data frame and
+    sometimes it just returns a single column:
+    
+    ```{r}
+    df1 <- data.frame(x = 1:3, y = 3:1)
+    class(df1[, 1:2])
+    class(df1[, 1])
+    
+    df2 <- data_frame(x = 1:3, y = 3:1)
+    class(df2[, 1:2])
+    class(df2[, 1])
+    ```
+    
+    To extract a single column it's use `[[` or `$`:
+    
+    ```{r}
+    class(df2[[1]])
+    class(df2$x)
+    ```
+
+*   When you extract a variable with `$`, tbl\_dfs never do partial 
+    matching. They'll throw an error if the column doesn't exist:
+    
+    ```{r, error = TRUE}
+    df <- data.frame(abc = 1)
+    df$a
+    
+    df2 <- data_frame(abc = 1)
+    df2$a
+    ```

From f0681f77b49555a909647aa2f990fd2962a1a327 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kirill=20M=C3=BCller?= <kirill.mueller@ivt.baug.ethz.ch>
Date: Thu, 10 Mar 2016 20:12:08 +0100
Subject: [PATCH 2/4] tibble

---
 vignettes/data_frames.Rmd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vignettes/data_frames.Rmd b/vignettes/data_frames.Rmd
index c57100d15..24c31021b 100644
--- a/vignettes/data_frames.Rmd
+++ b/vignettes/data_frames.Rmd
@@ -11,7 +11,7 @@ vignette: >
 ```{r, echo = FALSE, message = FALSE}
 knitr::opts_chunk$set(collapse = T, comment = "#>")
 options(tibble.print_min = 4L, tibble.print_max = 4L)
-library(dplyr)
+library(tibble)
 ```
 
 ## Creating

From e2e9111589335fab35b17cfa24d63fe52e6fd8a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kirill=20M=C3=BCller?= <kirill.mueller@ivt.baug.ethz.ch>
Date: Thu, 10 Mar 2016 20:17:33 +0100
Subject: [PATCH 3/4] need magrittr

---
 DESCRIPTION               | 7 +++++--
 vignettes/data_frames.Rmd | 1 +
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 3392a6d98..62fdf94a2 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -15,8 +15,11 @@ URL: https://github.com/krlmlr/tibble
 BugReports: https://github.com/krlmlr/tibble/issues
 Depends: R (>= 3.1.2)
 Imports: methods, assertthat, utils, lazyeval (>= 0.1.10), Rcpp
-Suggests: testthat, knitr, Lahman (>= 3.0.1),
-    rmarkdown
+Suggests: testthat,
+    knitr,
+    rmarkdown,
+    Lahman (>= 3.0.1),
+    magrittr
 LazyData: yes
 License: MIT + file LICENSE
 RoxygenNote: 5.0.1
diff --git a/vignettes/data_frames.Rmd b/vignettes/data_frames.Rmd
index 24c31021b..116cac422 100644
--- a/vignettes/data_frames.Rmd
+++ b/vignettes/data_frames.Rmd
@@ -11,6 +11,7 @@ vignette: >
 ```{r, echo = FALSE, message = FALSE}
 knitr::opts_chunk$set(collapse = T, comment = "#>")
 options(tibble.print_min = 4L, tibble.print_max = 4L)
+library(magrittr)
 library(tibble)
 ```
 

From fd5fa0fa54c0002faf7f834249f5c95fff0162dd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Kirill=20M=C3=BCller?= <kirill.mueller@ivt.baug.ethz.ch>
Date: Thu, 10 Mar 2016 20:44:27 +0100
Subject: [PATCH 4/4] microbenchmark

---
 DESCRIPTION | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 62fdf94a2..d10ff98e2 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -19,7 +19,8 @@ Suggests: testthat,
     knitr,
     rmarkdown,
     Lahman (>= 3.0.1),
-    magrittr
+    magrittr,
+    microbenchmark
 LazyData: yes
 License: MIT + file LICENSE
 RoxygenNote: 5.0.1