Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of [[ #1353

Open
mgirlich opened this issue Aug 25, 2022 · 2 comments
Open

Performance of [[ #1353

mgirlich opened this issue Aug 25, 2022 · 2 comments

Comments

@mgirlich
Copy link
Contributor

While working on rectangling tool for recursive data frames (see tidyverse/tidyr#1386) I noticed that tibble::[[ actually had quite a performance impact. Do you see a chance of improving the performance? Or maybe a low level version for assignment?

f <- function(x, n = 10e3) {
  for (i in seq(n)) {
    x[["x"]] <- 1L
  }
}

t <- tibble::tibble(x = 1L)
df <- data.frame(x = 1L)
l <- list(x = 1L)

bench::mark(
  tibble = f(t),
  dataframe = f(df),
  list = f(l)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 3 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 tibble     523.22ms 523.22ms      1.91   211.5KB     15.3
#> 2 dataframe   78.31ms  80.04ms     12.0     80.9KB     18.0
#> 3 list         1.35ms   1.52ms    560.          0B     31.9

Created on 2022-08-25 with reprex v2.0.2

@krlmlr
Copy link
Member

krlmlr commented Aug 28, 2022

Thanks, confirmed. On my system:

t <- tibble::tibble(x = 1L)
bench::mark(for (i in seq(1e4)) {
  t[["x"]] <- 1L
})
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#>   expression                                  min median itr/s…¹ mem_a…² gc/se…³
#>   <bch:expr>                               <bch:> <bch:>   <dbl> <bch:b>   <dbl>
#> 1 for (i in seq(10000)) { t[["x"]] <- 1L }  214ms  214ms    4.66   100KB    46.6
#> # … with abbreviated variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`

Created on 2022-08-28 by the reprex package (v2.0.1)

@mgirlich
Copy link
Contributor Author

Thanks for working on this directly. What do you think about a low level function tib_assign_col(df, j, value)? This should allow for some very good performance improvements. Though maybe this should live in vctrs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants