Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Reduce overhead of single-column subset assignment #1363

Merged
merged 5 commits into from
Aug 28, 2022

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Aug 28, 2022

For #1353.

One area where extra work is done: distinction between new and existing columns. Other than that, further substantial improvements seem to require moving to C code.

t <- tibble::tibble(x = 1L)
bench::mark(for (i in seq(1e4)) {
  t[["x"]] <- 1L
})
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 1 × 6
#>   expression                                  min median itr/s…¹ mem_a…² gc/se…³
#>   <bch:expr>                               <bch:> <bch:>   <dbl> <bch:b>   <dbl>
#> 1 for (i in seq(10000)) { t[["x"]] <- 1L }  162ms  165ms    6.03   100KB    46.7
#> # … with abbreviated variable names ¹​`itr/sec`, ²​mem_alloc, ³​`gc/sec`

Created on 2022-08-28 by the reprex package (v2.0.1)

@krlmlr krlmlr changed the title Reduce overhead of single-column subset assignment perf: Reduce overhead of single-column subset assignment Aug 28, 2022
@krlmlr krlmlr merged commit a0ccec2 into main Aug 28, 2022
@sebastian-gerdes
Copy link

Hello everyone,

I can also confirm this issue (just wanted to open a new issue and then found this existing issue:

library('tictoc')
n <- 1e5
tic()
my_tib <- tibble(.rows = n, x = NA)

# slow version: first construct tibble, than assign within tibble
for (i in 1:n) {
  my_tib$x[i] <- runif(1)
}
toc() # approx 10 seconds on my machine

# fast version: assign inside 'plain' vector and construct tibble later
tic()
x <- rep(NA, n)
for (i in 1:n) {
  x[i] <- runif(1)
}
my_tib <- tibble(x = x)
toc() # approx 0.1 seconds on my machine

I would really like to work with the first version, since this make the code a lot easier for my simulations, however, performance really might be a deal-breaker for me...

So I would really appreciate any attempts to improve the performance of tibble in this aspect!

Thanks and best greetings,
Sebastian

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants