-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide high performance limited functionality equivalent of tibble #350
Comments
What is the use case, and the performance hit we're seeing with the current implementation? We can move parts to C for better performance. On that note: should we move |
Use case is that you have a known "safe" data structure, and don't want to waste anytime checking (e.g. #353) |
Yeah, there is a difference of an order of magnitude, but we're talking about microseconds vs. a millisecond. I wonder if this is really worth the risk and the effort: library(tibble)
library(rlang)
df <- unclass(nycflights13::flights)
microbenchmark::microbenchmark(
tibble(!!! df),
invoke(tibble, df),
tibble:::new_tibble(df),
tibble:::new_tibble(df, nrow = 336776L)
)
#> Unit: microseconds
#> expr min lq mean
#> tibble(!(!(!df))) 1304.236 1348.9510 1809.99604
#> invoke(tibble, df) 1834.505 1950.2320 2208.62348
#> tibble:::new_tibble(df) 51.673 58.9760 73.82381
#> tibble:::new_tibble(df, nrow = 336776L) 51.407 56.8365 64.79954
#> median uq max neval cld
#> 1404.4650 1546.322 25915.788 100 b
#> 2042.5190 2142.852 5741.313 100 b
#> 63.5315 68.260 876.542 100 a
#> 61.9530 65.871 127.549 100 a
|
library(tibble)
library(rlang)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
df <-
nycflights13::flights %>%
count(origin, dest, month) %>%
unite("relation", origin, dest, sep = "->") %>%
spread(relation, n) %>%
unclass()
length(df)
#> [1] 225
microbenchmark::microbenchmark(
tibble(!!! df),
invoke(tibble, df),
as_tibble(df),
as_tibble(df, validate = FALSE),
tibble:::new_tibble(df),
tibble:::new_tibble(df, nrow = 12L)
)
#> Unit: microseconds
#> expr min lq mean
#> tibble(!(!(!df))) 30055.013 32404.4080 34550.9134
#> invoke(tibble, df) 34103.947 37124.7775 39586.9928
#> as_tibble(df) 1981.951 2141.3525 2528.6651
#> as_tibble(df, validate = FALSE) 305.573 342.8565 409.1099
#> tibble:::new_tibble(df) 172.964 195.4785 216.4160
#> tibble:::new_tibble(df, nrow = 12L) 171.111 189.9820 220.2521
#> median uq max neval cld
#> 34036.5380 35426.0850 84756.116 100 c
#> 39005.9890 40569.4315 69548.031 100 d
#> 2240.3870 2455.4070 5812.929 100 b
#> 369.4780 397.9385 2820.020 100 a
#> 207.7320 224.9585 440.641 100 a
#> 204.1255 221.5070 429.062 100 a
|
Also, |
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
i.e. with no evaluation or auto-naming - maybe something like
tibbly <- function(...) as_tibble(list(...))
or evennew_tibble(list(...))
.Or maybe we should encourage developers to use this idiom some other way?
The text was updated successfully, but these errors were encountered: