Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Make a data frame from a (possibly named) vector or list #31

Closed
jennybc opened this issue Mar 1, 2016 · 13 comments · Fixed by #74
Closed

FR: Make a data frame from a (possibly named) vector or list #31

jennybc opened this issue Mar 1, 2016 · 13 comments · Fixed by #74
Milestone

Comments

@jennybc
Copy link
Member

jennybc commented Mar 1, 2016

Here's something I do fairly often, mostly with a list, but sometimes with a vector: Initialize a data frame with that list or vector as a variable and, at the same time, promote its names to a proper variable. Or, perhaps, add a variable of row numbers. Why is it so important to add the names or row numbers? Because later you'll want to process with tidyr, i.e. with unnest() and/or spread().

I could point to some real uses if I need to really sell this. But hopefully this will just make sense. Or someone will tell me it's already easy to do? It is already easy, but perhaps worth making a function for.

library(tibble)

x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

## wish it were easy to make the names a proper variable
data_frame(id = names(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

## where id can easily default to row number
data_frame(id = seq_along(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (int)   (list)
#> 1     1 <chr[1]>
#> 2     2 <chr[1]>
#> 3     3 <chr[1]>
@krlmlr
Copy link
Member

krlmlr commented Mar 2, 2016

What verb would you use for this operation?

How about:

x %>% as_data_frame %>% tidyr::gather(id, thing)

Need to coerce to list if x is a vector. The following doesn't work:

x %>% tidyr::gather(id, thing)

I think this could be fixed by implementing gather_.list <- function(x, ...) gather_(tibble::data_frame(x), ...).

EDIT: The above also doesn't work if x is unnamed, but here you could use x %>% data_frame(thing=.) %>% add_rownames.

EDIT²: I think a new verb would help here, I'm not sure if this belongs here or in tidyr.

@jennybc
Copy link
Member Author

jennybc commented Mar 2, 2016

I will try to propose a name.

@jennybc
Copy link
Member Author

jennybc commented Mar 7, 2016

The more I think about it, maybe it makes sense to think about this as a treatment applied to a variable during the construction of a data frame. A way to say "add this variable AND promote its names to a proper variable". And also give some nice way of getting row numbers into the data frame? I have found dplyr::row_number() to be quite confusing / disappointing.

I realize id() is probably already overloaded with meaning already. upname() isn't great either, but hopefully this conveys the idea.

library(tibble)
x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

What if something like this:

df <- data_frame(id(x, "greek"))
## or
df <- data_frame(upname(x, "greek"))

produced this result

data_frame(greek = names(x), x = x)
#> Source: local data frame [3 x 2]
#> 
#>   greek        x
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

I also wish it were easier to get plain row numbers. I wish this is what row_number() did but clearly it does not. So, again, fiction! What if something like this:

df <- data_frame(i = row_number(), upname(x, "greek"))

produced something like this:

data_frame(i = seq_along(x),
           greek = names(x),
           x = x)
#> Source: local data frame [3 x 3]
#> 
#>       i greek        x
#>   (int) (chr)   (list)
#> 1     1 alpha <chr[1]>
#> 2     2  beta <chr[1]>
#> 3     3 gamma <chr[1]>

@jennybc
Copy link
Member Author

jennybc commented Mar 7, 2016

In addition to id() and upname(), dub() is a possible name for this name-promoting variable-pre-processing function.

@krlmlr
Copy link
Member

krlmlr commented Mar 7, 2016

Row numbers: Have you seen #11? Your example would be then

data_frame(...) %>% rownames_to_column("i")

How would you like:

dub <- function(x) as_data_frame(setNames(list(names(x), x), c("name", "value")))

This allows at least the creation of a two-column data frame from a named object, which then can be massaged further with the other dplyr verbs, and combined with other data frames using cbind().

@hadley: Would this perhaps be suitable for purrr:

unzip_names <- function(x) set_names(list(x, names(x)), c("name", "value"))
zip_names <- function(x) set_names(x[[1]], x[[2]])

@krlmlr krlmlr added this to the 2.0 milestone Mar 7, 2016
@jennybc
Copy link
Member Author

jennybc commented Mar 8, 2016

Sorry I can't really tell what #11 does just from reading the discussion. But I take your word for it that it would add the integers 1 through nrows(.) as variable i. Would it add as the first or last variable? Feels like you usually want it at the very front.

library(tibble)
x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')
dub <- function(x) as_data_frame(setNames(list(names(x), x), c("name", "value")))
dub(x)
#> Source: local data frame [3 x 2]
#> 
#>    name    value
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

The variables themselves and the object look great. Would dub() gain some arguments or different defaults in order to produce less generic names?

UPDATE: I think x and names of x are reversed in those speculative purrr functions.

@hadley
Copy link
Member

hadley commented Mar 8, 2016

What if this was just the as_data_frame() method for vectors?

@krlmlr
Copy link
Member

krlmlr commented Mar 8, 2016

@jennybc: purrr: Right, revised definition below.

unzip_names <- function(x) set_names(list(names(x), x), c("name", "value"))

rownames_to_column() will add to the front. Actually, we already have add_rownames() in dplyr, but it does "too much" and will be deprecated in favor of the new functions.

Defaults: I think we should support them, even if the renaming could be handle with a simple rename() step. tidyr::gather() has them too.

@hadley: Is there a dispatch for vector, as in as_data_frame.vector()? Otherwise it looks like we need to have as_data_frame.logical(), as_data_frame.character(), ...; we also need as_data_frame.Date(), as_data_frame.POSIXt(), ..., all with the same implementation. It also includes a certain amount of surprise -- if functions use as_data_frame() to convert input, the column names are auto-generated and the user might be unaware of it. Do you think it's worth it?

@hadley
Copy link
Member

hadley commented Mar 17, 2016

@krlmlr No, there's no "vector" virtual class, so implementation would be a bit tedious. But we don't need to have methods for Date and factor etc, because those will be caught by the method for the underlying atomic vector.

It seems like we're adding new functionality that previously was an error, so it doesn't seem too dangerous to me.

@krlmlr
Copy link
Member

krlmlr commented Apr 8, 2016

@jennybc: For now you could try kimisc::list_to_df() -- I totally forgot about this guy. I still think this should be part of tibble.

@ijlyttle
Copy link

Apologies if this is not helpful, but could purrr::map_df be useful?

library("dplyr")
library("purrr")

x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

x %>% map_df(~ data_frame(thing = .x), .id = "name")

Could a new verb be put in place of the function within map_df?

krlmlr pushed a commit that referenced this issue May 7, 2016
@hadley: Do we want this in tibble (#31)?
@krlmlr krlmlr mentioned this issue May 7, 2016
@krlmlr
Copy link
Member

krlmlr commented May 7, 2016

Note that this is already possible with #71:

x %>% as_data_frame %>% rownames_to_column

krlmlr added a commit that referenced this issue May 11, 2016
- New `enframe()` that converts vectors to two-column tibbles (#31, #74).
krlmlr pushed a commit that referenced this issue May 11, 2016
- New `enframe()` that converts vectors to two-column tibbles (#31, #74).
- Fix compatibility with `knitr` 1.13 (#76).
- Implement `as_data_frame.default()` (#71, tidyverse/dplyr#1752).
krlmlr pushed a commit that referenced this issue Jul 4, 2016
Follow-up release.

- `tibble()` is no longer an alias for `frame_data()` (#82).
- Remove `tbl_df()` (#57).
- `$` returns `NULL` if column not found, without partial matching. A warning is given (#109).
- `[[` returns `NULL` if column not found (#109).

- Reworked output: More concise summary (begins with hash `#` and contains more text (#95)), removed empty line, showing number of hidden rows and columns (#51). The trailing metadata also begins with hash `#` (#101). Presence of row names is indicated by a star in printed output (#72).
- Format `NA` values in character columns as `<NA>`, like `print.data.frame()` does (#69).
- The number of printed extra cols is now an option (#68, @lionel-).
- Computation of column width properly handles wide (e.g., Chinese) characters, tests still fail on Windows (#100).
- `glimpse()` shows nesting structure for lists and uses angle brackets for type (#98).
- Tibbles with `POSIXlt` columns can be printed now, the text `<POSIXlt>` is shown as placeholder to encourage usage of `POSIXct` (#86).
- `type_sum()` shows only topmost class for S3 objects.

- Strict checking of integer and logical column indexes. For integers, passing a non-integer index or an out-of-bounds index raises an error. For logicals, only vectors of length 1 or `ncol` are supported. Passing a matrix or an array now raises an error in any case (#83).
- Warn if setting non-`NULL` row names (#75).
- Consistently surround variable names with single quotes in error messages.
- Use "Unknown column 'x'" as error message if column not found, like base R (#94).
- `stop()` and `warning()` are now always called with `call. = FALSE`.

- The `.Dim` attribute is silently stripped from columns that are 1d matrices (#84).
- Converting a tibble without row names to a regular data frame does not add explicit row names.
- `as_tibble.data.frame()` preserves attributes, and uses `as_tibble.list()` to calling overriden methods which may lead to endless recursion.

- New `has_name() (#102).
- Prefer `tibble()` and `as_tibble()` over `data_frame()` and `as_data_frame()` in code and documentation (#82).
- New `is.tibble()` and `is_tibble()` (#79).
- New `enframe()` that converts vectors to two-column tibbles (#31, #74).
- `obj_sum()` and `type_sum()` show `"tibble"` instead of `"tbl_df"` for tibbles (#82).
- `as_tibble.data.frame()` gains `validate` argument (as in `as_tibble.list()`), if `TRUE` the input is validated.
- Implement `as_tibble.default()` (#71, tidyverse/dplyr#1752).
- `has_rownames()` supports arguments that are not data frames.

- Two-dimensional indexing with `[[` works (#58, #63).
- Subsetting with empty index (e.g., `x[]`) also removes row names.

- Document behavior of `as_tibble.tbl_df()` for subclasses (#60).
- Document and test that subsetting removes row names.

- Don't rely on `knitr` internals for testing (#78).
- Fix compatibility with `knitr` 1.13 (#76).
- Enhance `knit_print()` tests.
- Provide default implementation for `tbl_sum.tbl_sql()` and `tbl_sum.tbl_grouped_df()` to allow `dplyr` release before a `tibble` release.
- Explicit tests for `format_v()` (#98).
- Test output for `NULL` value of `tbl_sum()`.
- Test subsetting in all variants (#62).
- Add missing test from dplyr.
- Use new `expect_output_file()` from `testthat`.
@github-actions
Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants