-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make as_data_frame a generic so can work on matrices? #876
Comments
We already have
|
Maybe we need to make |
I have amused myself with this workaround now that I know about existing > plyr::alply(foo, 2) %>% as_data_frame
Source: local data frame [3 x 2]
1 2
1 A D
2 B E
3 C F |
To chime in, I do the following all over the place: result = some_function_call() %>%
as.data.frame() %>%
as_data_frame() Furthermore, I’m not sure I understand the need for the function |
This issue was referenced on @jennybc's Twitter account today, so I thought now might be a good time to chime in. I also use @klmr's pattern very frequently (except my code usually looks more like I'm not sure I understand @hadley's concern about speed. I thought that generics only took a few extra thousandths of a second. Is If speed is still a concern, then I'd also be nearly as happy with a function like this one:
It would add a small amount of cognitive overhead compared to just using a generic |
@davharris I'm concerned about speed because sometimes I apply |
@hadley Okay, I hadn't realized it would be called that many times. Here's some quick benchmarking that might be useful.
Here are the results on my laptop:
It looks like this dispatch implementation increases the computation time on a half-million lists from 25.6 seconds to 26.8 seconds. It probably makes sense to do more extensive benchmarking (for example, would things slow down if we added more methods?). So far, things look pretty good, though. |
@davharris thanks - that seems reasonable and suggests that I shouldn't worry about it. I'll make the change when I'm next working on dplyr. An efficient implementation of |
@hadley Perfect. Thanks.
|
Déjà vu: trying to turn a character matrix into a Bump. library(dplyr)
ip <- installed.packages() %>% as.tbl()
#> Error in UseMethod("as.tbl"): no applicable method for 'as.tbl' applied to an object of class "c('matrix', 'character')"
ip <- installed.packages() %>% as_data_frame()
#> Error: is.list(x) is not TRUE |
I actually just wrote an efficient implementation for this for tidyr - I'll hoist it up into dplyr. |
Add method for list (existing), data frame (trivial) and matrix (from tidyr). Fixes tidyverse#876
- Initial CRAN release - Extracted from `dplyr` 0.4.3 - Exported functions: - `tbl_df()` - `as_data_frame()` - `data_frame()`, `data_frame_()` - `frame_data()`, `tibble()` - `glimpse()` - `trunc_mat()`, `knit_print.trunc_mat()` - `type_sum()` - New `lst()` and `lst_()` create lists in the same way that `data_frame()` and `data_frame_()` create data frames (tidyverse/dplyr#1290). `lst(NULL)` doesn't raise an error (#17, @jennybc), but always uses deparsed expression as name (even for `NULL`). - New `add_row()` makes it easy to add a new row to data frame (tidyverse/dplyr#1021). - New `rownames_to_column()` and `column_to_rownames()` (#11, @zhilongjia). - New `has_rownames()` and `remove_rownames()` (#44). - New `repair_names()` fixes missing and duplicate names (#10, #15, @r2evans). - New `is_vector_s3()`. - Features - New `as_data_frame.table()` with argument `n` to control name of count column (#22, #23). - Use `tibble` prefix for options (#13, #36). - `glimpse()` now (invisibly) returns its argument (tidyverse/dplyr#1570). It is now a generic, the default method dispatches to `str()` (tidyverse/dplyr#1325). The default width is obtained from the `tibble.width` option (#35, #56). - `as_data_frame()` is now an S3 generic with methods for lists (the old `as_data_frame()`), data frames (trivial), matrices (with efficient C++ implementation) (tidyverse/dplyr#876), and `NULL` (returns a 0-row 0-column data frame) (#17, @jennybc). - Non-scalar input to `frame_data()` and `tibble()` (including lists) creates list-valued columns (#7). These functions return 0-row but n-col data frame if no data. - Bug fixes - `frame_data()` properly constructs rectangular tables (tidyverse/dplyr#1377, @kevinushey). - Minor modifications - Uses `setOldClass(c("tbl_df", "tbl", "data.frame"))` to help with S4 (tidyverse/dplyr#969). - `tbl_df()` automatically generates column names (tidyverse/dplyr#1606). - `tbl_df`s gain `$` and `[[` methods that are ~5x faster than the defaults, never do partial matching (tidyverse/dplyr#1504), and throw an error if the variable does not exist. `[[.tbl_df()` falls back to regular subsetting when used with anything other than a single string (#29). `base::getElement()` now works with tibbles (#9). - `all_equal()` allows to compare data frames ignoring row and column order, and optionally ignoring minor differences in type (e.g. int vs. double) (tidyverse/dplyr#821). Used by `all.equal()` for tibbles. (This package contains a pure R implementation of `all_equal()`, the `dplyr` code has identical behavior but is written in C++ and thus faster.) - The internals of `data_frame()` and `as_data_frame()` have been aligned, so `as_data_frame()` will now automatically recycle length-1 vectors. Both functions give more informative error messages if you are attempting to create an invalid data frame. You can no longer create a data frame with duplicated names (tidyverse/dplyr#820). Both functions now check that you don't have any `POSIXlt` columns, and tell you to use `POSIXct` if you do (tidyverse/dplyr#813). `data_frame(NULL)` raises error "must be a 1d atomic vector or list". - `trunc_mat()` and `print.tbl_df()` are considerably faster if you have very wide data frames. They will now also only list the first 100 additional variables not already on screen - control this with the new `n_extra` parameter to `print()` (tidyverse/dplyr#1161). The type of list columns is printed correctly (tidyverse/dplyr#1379). The `width` argument is used also for 0-row or 0-column data frames (#18). - When used in list-columns, S4 objects only print the class name rather than the full class hierarchy (#33). - Add test that `[.tbl_df()` does not change class (#41, @jennybc). Improve `[.tbl_df()` error message. - Documentation - Update README, with edits (#52, @bhive01) and enhancements (#54, @jennybc). - `vignette("tibble")` describes the difference between tbl_dfs and regular data frames (tidyverse/dplyr#1468). - Code quality - Test using new-style Travis-CI and AppVeyor. Full test coverage (#24, #53). Regression tests load known output from file (#49). - Renamed `obj_type()` to `obj_sum()`, improvements, better integration with `type_sum()`. - Internal cleanup.
I love love love
data_frame()
. I wish I could get that behaviour when converting an existing matrix to a data.frame.The text was updated successfully, but these errors were encountered: