Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rows_patch(x, y, ...) shows unhelpful error message when nrow(y) > nrow(x) #5699

Closed
SteadyGiant opened this issue Jan 20, 2021 · 4 comments · Fixed by #6203
Closed

rows_patch(x, y, ...) shows unhelpful error message when nrow(y) > nrow(x) #5699

SteadyGiant opened this issue Jan 20, 2021 · 4 comments · Fixed by #6203
Labels
tables 🧮 joins and set operations

Comments

@SteadyGiant
Copy link

When I try to patch a data frame (x in the below example) using another data frame with more rows (y), an error is thrown with an unhelpful message. It took some time to figure out that the row number difference was the source of the error. rlang::last_error() and rlang::last_trace() didn't help.

Is there any reason a data frame should not be patched by a data frame with more rows? I could be missing some theory here. If there is a good reason why this should not be allowed, can I add that to the documentation for rows_patch() and friends?

Example:

suppressPackageStartupMessages(library(dplyr))

x = data.frame(id = c(1, 2, 3), foo = c("a", "b", NA))
y = data.frame(id = c(1, 3, 5, 7), foo = c("a", "c", "e", "g"))
dplyr::rows_patch(x, y, by = "id")
#> Error: Attempting to patch missing rows.

Created on 2021-01-20 by the reprex package (v0.3.0)

Session info
sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.30       magrittr_2.0.1   tidyselect_1.1.0 R6_2.5.0        
#>  [5] rlang_0.4.10     stringr_1.4.0    highr_0.8        tools_4.0.3     
#>  [9] xfun_0.20        DBI_1.1.1        htmltools_0.5.1  ellipsis_0.3.1  
#> [13] yaml_2.2.1       digest_0.6.27    assertthat_0.2.1 tibble_3.0.5    
#> [17] lifecycle_0.2.0  crayon_1.3.4     purrr_0.3.4      vctrs_0.3.6     
#> [21] glue_1.4.2       evaluate_0.14    rmarkdown_2.6    stringi_1.5.3   
#> [25] compiler_4.0.3   pillar_1.4.7     generics_0.1.0   pkgconfig_2.0.3

Expected output:

data.frame(id = c(1, 2, 3), foo = c("a", "b", "c"))
#>   id foo
#> 1  1   a
#> 2  2   b
#> 3  3   c
@SteadyGiant SteadyGiant changed the title rows_patch() throws unhelpful error when nrow(y) > nrow(x) rows_patch(x, y, ...) throws unhelpful error when nrow(y) > nrow(x) Jan 20, 2021
@SteadyGiant SteadyGiant changed the title rows_patch(x, y, ...) throws unhelpful error when nrow(y) > nrow(x) rows_patch(x, y, ...) shows unhelpful error message when nrow(y) > nrow(x) Jan 20, 2021
@hadley hadley added bug an unexpected problem or unintended behavior tables 🧮 joins and set operations labels Apr 19, 2021
@hadley hadley changed the title rows_patch(x, y, ...) shows unhelpful error message when nrow(y) > nrow(x) rows_patch(x, y, ...) shows unhelpful error message when nrow(y) > nrow(x) Apr 19, 2021
@krlmlr
Copy link
Member

krlmlr commented Jun 24, 2021

I think the error is good. The code attempts to patch rows that don't exist on the left-hand side, this is an error for rows_patch() but works in rows_upsert(). Do we need a better error message?

x = data.frame(id = c(1, 2, 3), foo = c("a", "b", NA))
y = data.frame(id = c(1, 3, 5, 7), foo = c("a", "c", "e", "g"))
dplyr::rows_upsert(x, y, by = "id")
#>   id foo
#> 1  1   a
#> 2  2   b
#> 3  3   c
#> 4  5   e
#> 5  7   g

Created on 2021-06-24 by the reprex package (v2.0.0)

@drmowinckels
Copy link

I'm commenting here, as its the cleanest place I can see, rather than in the PR that closes it.

I'm having a bit of hard time wrapping my head around the erroring up rows_patch when there are more cases in y.
My intuition was that this would work a little like left_join in that it would would only patch rows existing in x, and ignore the rest in the y that dont match.

At least, that is a feature that is very meaningful to me.

@DavisVaughan
Copy link
Member

We typically recommend that you open a new issue that references the old one rather than adding on to conversation of an already old issue. That is much easier for us to track!

Also, you can do that in the dev version with unmatched = "ignore".

library(dplyr)

x <- tibble(a = 1:3, b = letters[c(1:2, NA)], c = 0.5 + 0:2)
x
#> # A tibble: 3 × 3
#>       a b         c
#>   <int> <chr> <dbl>
#> 1     1 a       0.5
#> 2     2 b       1.5
#> 3     3 <NA>    2.5

y <- tibble(a = 3:4, b = "z")
y
#> # A tibble: 2 × 2
#>       a b    
#>   <int> <chr>
#> 1     3 z    
#> 2     4 z

rows_patch(x, y, by = "a")
#> Error in `rows_patch()`:
#> ! `y` must contain keys that already exist in `x`.
#> ℹ The following rows in `y` have keys that don't exist in `x`: `c(2)`.
#> ℹ Use `unmatched = "ignore"` if you want to ignore these `y` rows.

#> Backtrace:
#>     ▆
#>  1. ├─dplyr::rows_patch(x, y, by = "a")
#>  2. └─dplyr:::rows_patch.data.frame(x, y, by = "a") at dplyr/R/rows.R:249:2
#>  3.   └─dplyr:::rows_check_y_unmatched(x_key, y_key, unmatched) at dplyr/R/rows.R:281:2
#>  4.     └─rlang::abort(message, call = error_call) at dplyr/R/rows.R:578:6

rows_patch(x, y, by = "a", unmatched = "ignore")
#> # A tibble: 3 × 3
#>       a b         c
#>   <int> <chr> <dbl>
#> 1     1 a       0.5
#> 2     2 b       1.5
#> 3     3 z       2.5

Created on 2022-11-11 with reprex v2.0.2.9000

@drmowinckels
Copy link

Sorry, my bad. Will remember in the future. Thanks! This is the functionality I was after.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tables 🧮 joins and set operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants