Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide high performance limited functionality equivalent of tibble #350

Closed
hadley opened this issue Jan 4, 2018 · 6 comments
Closed

Provide high performance limited functionality equivalent of tibble #350

hadley opened this issue Jan 4, 2018 · 6 comments

Comments

@hadley
Copy link
Member

hadley commented Jan 4, 2018

i.e. with no evaluation or auto-naming - maybe something like tibbly <- function(...) as_tibble(list(...)) or even new_tibble(list(...)).

Or maybe we should encourage developers to use this idiom some other way?

@krlmlr
Copy link
Member

krlmlr commented Jan 4, 2018

What is the use case, and the performance hit we're seeing with the current implementation? We can move parts to C for better performance. On that note: should we move lst() to rlang?

@hadley
Copy link
Member Author

hadley commented Jan 4, 2018

Use case is that you have a known "safe" data structure, and don't want to waste anytime checking (e.g. #353)

@krlmlr
Copy link
Member

krlmlr commented Jan 15, 2018

Yeah, there is a difference of an order of magnitude, but we're talking about microseconds vs. a millisecond. I wonder if this is really worth the risk and the effort:

library(tibble)
library(rlang)
df <- unclass(nycflights13::flights)

microbenchmark::microbenchmark(
  tibble(!!! df),
  invoke(tibble, df),
  tibble:::new_tibble(df),
  tibble:::new_tibble(df, nrow = 336776L)
)
#> Unit: microseconds
#>                                     expr      min        lq       mean
#>                        tibble(!(!(!df))) 1304.236 1348.9510 1809.99604
#>                       invoke(tibble, df) 1834.505 1950.2320 2208.62348
#>                  tibble:::new_tibble(df)   51.673   58.9760   73.82381
#>  tibble:::new_tibble(df, nrow = 336776L)   51.407   56.8365   64.79954
#>     median       uq       max neval cld
#>  1404.4650 1546.322 25915.788   100   b
#>  2042.5190 2142.852  5741.313   100   b
#>    63.5315   68.260   876.542   100  a 
#>    61.9530   65.871   127.549   100  a

Created on 2018-01-15 by the reprex package (v0.1.1.9000).

@krlmlr
Copy link
Member

krlmlr commented Jan 15, 2018

as_tibble(validate = FALSE) seems to be just fast enough (slowdown x2 with a wide data frame, mostly due to the length check which I'd rather keep). I'll add a reminder to update documentation to mention this:

library(tibble)
library(rlang)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)
df <-
  nycflights13::flights %>% 
  count(origin, dest, month) %>%
  unite("relation", origin, dest, sep = "->") %>%
  spread(relation, n) %>% 
  unclass()

length(df)
#> [1] 225

microbenchmark::microbenchmark(
  tibble(!!! df),
  invoke(tibble, df),
  as_tibble(df),
  as_tibble(df, validate = FALSE),
  tibble:::new_tibble(df),
  tibble:::new_tibble(df, nrow = 12L)
)
#> Unit: microseconds
#>                                 expr       min         lq       mean
#>                    tibble(!(!(!df))) 30055.013 32404.4080 34550.9134
#>                   invoke(tibble, df) 34103.947 37124.7775 39586.9928
#>                        as_tibble(df)  1981.951  2141.3525  2528.6651
#>      as_tibble(df, validate = FALSE)   305.573   342.8565   409.1099
#>              tibble:::new_tibble(df)   172.964   195.4785   216.4160
#>  tibble:::new_tibble(df, nrow = 12L)   171.111   189.9820   220.2521
#>      median         uq       max neval  cld
#>  34036.5380 35426.0850 84756.116   100   c 
#>  39005.9890 40569.4315 69548.031   100    d
#>   2240.3870  2455.4070  5812.929   100  b  
#>    369.4780   397.9385  2820.020   100 a   
#>    207.7320   224.9585   440.641   100 a   
#>    204.1255   221.5070   429.062   100 a

Created on 2018-01-15 by the reprex package (v0.1.1.9000).

@krlmlr
Copy link
Member

krlmlr commented Jan 15, 2018

Also, new_tibble() is already exported.

@github-actions
Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants