FR: Make a data frame from a (possibly named) vector or list #31

jennybc · 2016-03-01T19:58:45Z

Here's something I do fairly often, mostly with a list, but sometimes with a vector: Initialize a data frame with that list or vector as a variable and, at the same time, promote its names to a proper variable. Or, perhaps, add a variable of row numbers. Why is it so important to add the names or row numbers? Because later you'll want to process with tidyr, i.e. with unnest() and/or spread().

I could point to some real uses if I need to really sell this. But hopefully this will just make sense. Or someone will tell me it's already easy to do? It is already easy, but perhaps worth making a function for.

library(tibble)

x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

## wish it were easy to make the names a proper variable
data_frame(id = names(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

## where id can easily default to row number
data_frame(id = seq_along(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (int)   (list)
#> 1     1 <chr[1]>
#> 2     2 <chr[1]>
#> 3     3 <chr[1]>

The text was updated successfully, but these errors were encountered:

krlmlr · 2016-03-02T10:31:47Z

What verb would you use for this operation?

How about:

x %>% as_data_frame %>% tidyr::gather(id, thing)

Need to coerce to list if x is a vector. The following doesn't work:

x %>% tidyr::gather(id, thing)

I think this could be fixed by implementing gather_.list <- function(x, ...) gather_(tibble::data_frame(x), ...).

EDIT: The above also doesn't work if x is unnamed, but here you could use x %>% data_frame(thing=.) %>% add_rownames.

EDIT²: I think a new verb would help here, I'm not sure if this belongs here or in tidyr.

jennybc · 2016-03-02T15:37:17Z

I will try to propose a name.

jennybc · 2016-03-07T21:19:42Z

The more I think about it, maybe it makes sense to think about this as a treatment applied to a variable during the construction of a data frame. A way to say "add this variable AND promote its names to a proper variable". And also give some nice way of getting row numbers into the data frame? I have found dplyr::row_number() to be quite confusing / disappointing.

I realize id() is probably already overloaded with meaning already. upname() isn't great either, but hopefully this conveys the idea.

library(tibble)
x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

What if something like this:

df <- data_frame(id(x, "greek"))
## or
df <- data_frame(upname(x, "greek"))

produced this result

data_frame(greek = names(x), x = x)
#> Source: local data frame [3 x 2]
#> 
#>   greek        x
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

I also wish it were easier to get plain row numbers. I wish this is what row_number() did but clearly it does not. So, again, fiction! What if something like this:

df <- data_frame(i = row_number(), upname(x, "greek"))

produced something like this:

data_frame(i = seq_along(x),
           greek = names(x),
           x = x)
#> Source: local data frame [3 x 3]
#> 
#>       i greek        x
#>   (int) (chr)   (list)
#> 1     1 alpha <chr[1]>
#> 2     2  beta <chr[1]>
#> 3     3 gamma <chr[1]>

jennybc · 2016-03-07T21:45:48Z

In addition to id() and upname(), dub() is a possible name for this name-promoting variable-pre-processing function.

krlmlr · 2016-03-07T22:28:19Z

Row numbers: Have you seen #11? Your example would be then

data_frame(...) %>% rownames_to_column("i")

How would you like:

dub <- function(x) as_data_frame(setNames(list(names(x), x), c("name", "value")))

This allows at least the creation of a two-column data frame from a named object, which then can be massaged further with the other dplyr verbs, and combined with other data frames using cbind().

@hadley: Would this perhaps be suitable for purrr:

unzip_names <- function(x) set_names(list(x, names(x)), c("name", "value"))
zip_names <- function(x) set_names(x[[1]], x[[2]])

jennybc · 2016-03-08T01:49:54Z

Sorry I can't really tell what #11 does just from reading the discussion. But I take your word for it that it would add the integers 1 through nrows(.) as variable i. Would it add as the first or last variable? Feels like you usually want it at the very front.

library(tibble)
x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')
dub <- function(x) as_data_frame(setNames(list(names(x), x), c("name", "value")))
dub(x)
#> Source: local data frame [3 x 2]
#> 
#>    name    value
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

The variables themselves and the object look great. Would dub() gain some arguments or different defaults in order to produce less generic names?

UPDATE: I think x and names of x are reversed in those speculative purrr functions.

hadley · 2016-03-08T02:10:44Z

What if this was just the as_data_frame() method for vectors?

krlmlr · 2016-03-08T06:30:29Z

@jennybc: purrr: Right, revised definition below.

unzip_names <- function(x) set_names(list(names(x), x), c("name", "value"))

rownames_to_column() will add to the front. Actually, we already have add_rownames() in dplyr, but it does "too much" and will be deprecated in favor of the new functions.

Defaults: I think we should support them, even if the renaming could be handle with a simple rename() step. tidyr::gather() has them too.

@hadley: Is there a dispatch for vector, as in as_data_frame.vector()? Otherwise it looks like we need to have as_data_frame.logical(), as_data_frame.character(), ...; we also need as_data_frame.Date(), as_data_frame.POSIXt(), ..., all with the same implementation. It also includes a certain amount of surprise -- if functions use as_data_frame() to convert input, the column names are auto-generated and the user might be unaware of it. Do you think it's worth it?

hadley · 2016-03-17T22:52:11Z

@krlmlr No, there's no "vector" virtual class, so implementation would be a bit tedious. But we don't need to have methods for Date and factor etc, because those will be caught by the method for the underlying atomic vector.

It seems like we're adding new functionality that previously was an error, so it doesn't seem too dangerous to me.

krlmlr · 2016-04-08T20:44:27Z

@jennybc: For now you could try kimisc::list_to_df() -- I totally forgot about this guy. I still think this should be part of tibble.

ijlyttle · 2016-04-15T01:56:44Z

Apologies if this is not helpful, but could purrr::map_df be useful?

library("dplyr")
library("purrr")

x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

x %>% map_df(~ data_frame(thing = .x), .id = "name")

Could a new verb be put in place of the function within map_df?

@hadley

@hadley: Do we want this in tibble (#31)?

krlmlr · 2016-05-07T21:10:37Z

Note that this is already possible with #71:

x %>% as_data_frame %>% rownames_to_column

- New `enframe()` that converts vectors to two-column tibbles (#31, #74).

- New `enframe()` that converts vectors to two-column tibbles (#31, #74). - Fix compatibility with `knitr` 1.13 (#76). - Implement `as_data_frame.default()` (#71, tidyverse/dplyr#1752).

@lionel-

Follow-up release. - `tibble()` is no longer an alias for `frame_data()` (#82). - Remove `tbl_df()` (#57). - `$` returns `NULL` if column not found, without partial matching. A warning is given (#109). - `[[` returns `NULL` if column not found (#109). - Reworked output: More concise summary (begins with hash `#` and contains more text (#95)), removed empty line, showing number of hidden rows and columns (#51). The trailing metadata also begins with hash `#` (#101). Presence of row names is indicated by a star in printed output (#72). - Format `NA` values in character columns as `<NA>`, like `print.data.frame()` does (#69). - The number of printed extra cols is now an option (#68, @lionel-). - Computation of column width properly handles wide (e.g., Chinese) characters, tests still fail on Windows (#100). - `glimpse()` shows nesting structure for lists and uses angle brackets for type (#98). - Tibbles with `POSIXlt` columns can be printed now, the text `<POSIXlt>` is shown as placeholder to encourage usage of `POSIXct` (#86). - `type_sum()` shows only topmost class for S3 objects. - Strict checking of integer and logical column indexes. For integers, passing a non-integer index or an out-of-bounds index raises an error. For logicals, only vectors of length 1 or `ncol` are supported. Passing a matrix or an array now raises an error in any case (#83). - Warn if setting non-`NULL` row names (#75). - Consistently surround variable names with single quotes in error messages. - Use "Unknown column 'x'" as error message if column not found, like base R (#94). - `stop()` and `warning()` are now always called with `call. = FALSE`. - The `.Dim` attribute is silently stripped from columns that are 1d matrices (#84). - Converting a tibble without row names to a regular data frame does not add explicit row names. - `as_tibble.data.frame()` preserves attributes, and uses `as_tibble.list()` to calling overriden methods which may lead to endless recursion. - New `has_name() (#102). - Prefer `tibble()` and `as_tibble()` over `data_frame()` and `as_data_frame()` in code and documentation (#82). - New `is.tibble()` and `is_tibble()` (#79). - New `enframe()` that converts vectors to two-column tibbles (#31, #74). - `obj_sum()` and `type_sum()` show `"tibble"` instead of `"tbl_df"` for tibbles (#82). - `as_tibble.data.frame()` gains `validate` argument (as in `as_tibble.list()`), if `TRUE` the input is validated. - Implement `as_tibble.default()` (#71, tidyverse/dplyr#1752). - `has_rownames()` supports arguments that are not data frames. - Two-dimensional indexing with `[[` works (#58, #63). - Subsetting with empty index (e.g., `x[]`) also removes row names. - Document behavior of `as_tibble.tbl_df()` for subclasses (#60). - Document and test that subsetting removes row names. - Don't rely on `knitr` internals for testing (#78). - Fix compatibility with `knitr` 1.13 (#76). - Enhance `knit_print()` tests. - Provide default implementation for `tbl_sum.tbl_sql()` and `tbl_sum.tbl_grouped_df()` to allow `dplyr` release before a `tibble` release. - Explicit tests for `format_v()` (#98). - Test output for `NULL` value of `tbl_sum()`. - Test subsetting in all variants (#62). - Add missing test from dplyr. - Use new `expect_output_file()` from `testthat`.

github-actions · 2020-12-15T00:40:23Z

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

krlmlr added this to the 2.0 milestone Mar 7, 2016

krlmlr pushed a commit that referenced this issue May 7, 2016

new dub()

54057ce

@hadley: Do we want this in tibble (#31)?

krlmlr mentioned this issue May 7, 2016

New enframe() #74

Merged

krlmlr closed this as completed in #74 May 11, 2016

krlmlr added a commit that referenced this issue May 11, 2016

Merge pull request #74 from hadley/feature/31-dub

eeb01a3

- New `enframe()` that converts vectors to two-column tibbles (#31, #74).

krlmlr pushed a commit that referenced this issue May 11, 2016

Merge tag 'v1.0-4'

494f4f1

- New `enframe()` that converts vectors to two-column tibbles (#31, #74). - Fix compatibility with `knitr` 1.13 (#76). - Implement `as_data_frame.default()` (#71, tidyverse/dplyr#1752).

github-actions bot locked and limited conversation to collaborators Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: Make a data frame from a (possibly named) vector or list #31

FR: Make a data frame from a (possibly named) vector or list #31

jennybc commented Mar 1, 2016

krlmlr commented Mar 2, 2016

jennybc commented Mar 2, 2016

jennybc commented Mar 7, 2016

jennybc commented Mar 7, 2016

krlmlr commented Mar 7, 2016

jennybc commented Mar 8, 2016

hadley commented Mar 8, 2016

krlmlr commented Mar 8, 2016

hadley commented Mar 17, 2016

krlmlr commented Apr 8, 2016

ijlyttle commented Apr 15, 2016

krlmlr commented May 7, 2016

github-actions bot commented Dec 15, 2020

FR: Make a data frame from a (possibly named) vector or list #31

FR: Make a data frame from a (possibly named) vector or list #31

Comments

jennybc commented Mar 1, 2016

krlmlr commented Mar 2, 2016

jennybc commented Mar 2, 2016

jennybc commented Mar 7, 2016

jennybc commented Mar 7, 2016

krlmlr commented Mar 7, 2016

jennybc commented Mar 8, 2016

hadley commented Mar 8, 2016

krlmlr commented Mar 8, 2016

hadley commented Mar 17, 2016

krlmlr commented Apr 8, 2016

ijlyttle commented Apr 15, 2016

krlmlr commented May 7, 2016

github-actions bot commented Dec 15, 2020