-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unnest_tree()
#1386
Add unnest_tree()
#1386
Conversation
# unclass to avoid slow `[[.tbl_df` and `[[<-.tbl_df` | ||
x <- unclass(x) | ||
child_children <- x[[child_col]] | ||
if (inherits(child_children, "vctrs_list_of")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the vctrs_list_of
class has huge impact on performance:
After removing:
devtools::load_all("~/GitHub/tidyr/")
#> ℹ Loading tidyr
test_tree <- readr::read_rds("~/GitHub/tidyr/test-tree.rds")
bench::mark(
unnest_tree = unnest_tree(
test_tree,
id,
children
)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 unnest_tree 170ms 180ms 5.27 3.97MB 7.02
Created on 2022-08-25 with reprex v2.0.2
Before removing:
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 unnest_tree 26.9s 26.9s 0.0372 508MB 7.62
Created on 2022-08-25 with reprex v2.0.2
} | ||
|
||
# unclass to avoid slow `[[.tbl_df` and `[[<-.tbl_df` | ||
x <- unclass(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not quite as bad as the list_of
performance hit but still a lot. Without unclassing this takes
devtools::load_all("~/GitHub/tidyr/")
#> ℹ Loading tidyr
test_tree <- readr::read_rds("~/GitHub/tidyr/test-tree.rds")
bench::mark(
unnest_tree = unnest_tree(
test_tree,
id,
children
)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 unnest_tree 425ms 428ms 2.34 3.89MB 7.01
Created on 2022-08-25 with reprex v2.0.2
I won't be able to take a look for a while, but it seems interesting at first glance BTW I just merged #1387 so you should be able to rebase/merge-main so the checks can pass (except some weirdness with the pkgdown one right now) |
Thanks for the quick feedback. I'm fine with waiting for a review, already good to know that you consider it interesting. Some questions for the future:
|
Closes #1384.
@DavisVaughan Would you be interested in adding this to tidyr? If so, what about the inverse operation
nest_tree()
?