-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dplyr can't summarize this variable #2919
Comments
While this may not directly answer your question, you can simply omit the test2 %>%
filter(!is.na(russia)) %>% # takes out all potentially problematic entries
summarise(m = mean(russia)) |
Thanks. Yes, that works. But it still doesn't explain why dplyr can't summarize that variable. |
Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session. |
Here's a full example using reprex, including downloading from that zip file I posted with the library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
## Returns NA
test2 %>%
summarize_at("russia", funs(m = mean(., na.rm = TRUE)))
#> m
#> 1 NA
##Returns 5.5ish
mean(test2$russia, na.rm = TRUE)
#> [1] 5.516387 |
Just FYI - I updated dplyr (to 0.7.3, latest CRAN release) and this issue is still there. |
Just adding this reprex; it seems that it works if you filter out the library(dplyr)
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
test2 %>%
summarise_at("russia", mean, na.rm = TRUE)
#> russia
#> 1 NA
test2 %>%
filter(!is.na(russia)) %>%
summarise_at("russia", mean, na.rm = TRUE)
#> russia
#> 1 5.516387
test3 <- test2 %>%
filter(!is.na(russia))
test3 %>%
summarise_at("russia", mean)
#> russia
#> 1 5.516387
test2 %>%
filter(!is.na(russia)) %>%
summarize_at("russia", funs(m = mean(., na.rm = TRUE)))
#> m
#> 1 5.516387 Created on 2017-12-28 by the reprex package (v0.1.1.9000). |
Thanks for the reprex. This appears to work now with the CRAN versions of dplyr and rlang, though I don't know why. Can you confirm? library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
test2 %>%
summarise_at("russia", mean, na.rm = TRUE)
#> russia
#> 1 5.516387
Session infodevtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.3 (2017-11-30)
#> system x86_64, linux-gnu
#> ui X11
#> language en_US
#> collate en_US.UTF-8
#> tz Europe/Busingen
#> date 2018-01-19
#> Packages -----------------------------------------------------------------
#> package * version date source
#> assertthat 0.2.0 2017-04-11 CRAN (R 3.4.1)
#> backports 1.1.2 2017-12-13 cran (@1.1.2)
#> base * 3.4.3 2017-12-01 local
#> bindr 0.1 2017-06-15 local
#> bindrcpp 0.2 2017-06-18 local (krlmlr/bindrcpp@dfce02c)
#> cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0)
#> compiler 3.4.3 2017-12-01 local
#> curl 3.1 2017-12-12 CRAN (R 3.4.3)
#> data.table 1.10.4-3 2017-10-27 CRAN (R 3.4.3)
#> datasets * 3.4.3 2017-12-01 local
#> devtools 1.13.4 2017-11-09 CRAN (R 3.4.2)
#> digest 0.6.13 2017-12-14 CRAN (R 3.4.3)
#> dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.3)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
#> forcats 0.2.0.9000 2017-09-27 local
#> foreign 0.8-69 2017-06-21 CRAN (R 3.4.1)
#> glue 1.2.0.9000 2017-11-22 Github (tidyverse/glue@752458e)
#> graphics * 3.4.3 2017-12-01 local
#> grDevices * 3.4.3 2017-12-01 local
#> haven 1.1.0 2017-07-09 CRAN (R 3.4.1)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.1)
#> knitr 1.18 2017-12-27 CRAN (R 3.4.3)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.3)
#> memoise 1.1.0 2017-08-07 Github (hadley/memoise@d63ae9c)
#> methods * 3.4.3 2017-12-01 local
#> openxlsx 4.0.29 2017-11-21 local
#> pillar 1.0.99.9001 2018-01-14 local (r-lib/pillar@9d96835)
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.1)
#> R6 2.2.2 2017-06-17 CRAN (R 3.4.1)
#> Rcpp 0.12.14.5 2018-01-11 local
#> readxl 1.0.0 2017-04-18 CRAN (R 3.4.3)
#> rio * 0.5.5 2017-06-18 CRAN (R 3.4.3)
#> rlang 0.1.6 2017-12-21 CRAN (R 3.4.3)
#> rmarkdown 1.8 2017-11-17 CRAN (R 3.4.3)
#> rprojroot 1.3-2 2018-01-03 local (krlmlr/rprojroot@851d293)
#> stats * 3.4.3 2017-12-01 local
#> stringi 1.1.6 2017-11-17 CRAN (R 3.4.3)
#> stringr 1.2.0 2017-02-18 CRAN (R 3.4.1)
#> tibble 1.4.1.9000 2018-01-15 local
#> tools 3.4.3 2017-12-01 local
#> utils * 3.4.3 2017-12-01 local
#> withr 2.1.1.9000 2017-12-30 Github (r-lib/withr@df18523)
#> yaml 2.1.16 2017-12-12 CRAN (R 3.4.3)
#> zip 1.0.0 2017-04-25 CRAN (R 3.4.2) |
I still experience this problem. sessioninfo:
|
Just to double-check, can you please run the following code and paste the results from the clipboard: reprex::reprex(si = TRUE, {
library(dplyr)
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
test2 %>%
summarize_at("russia", mean, na.rm = TRUE)
}) |
```r
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
test2 %>%
summarize_at("russia", mean, na.rm = TRUE)
#> russia
#> 1 NA
```
<details>
<summary>Session info</summary>
``` r
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.3 (2017-11-30)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz America/Chicago
#> date 2018-01-19
#> Packages -----------------------------------------------------------------
#> package * version date source
#> assertthat 0.2.0 2017-04-11 CRAN (R 3.4.2)
#> backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
#> base * 3.4.3 2017-11-30 local
#> bindr 0.1 2016-11-13 CRAN (R 3.4.2)
#> bindrcpp 0.2 2017-06-17 CRAN (R 3.4.2)
#> cellranger 1.1.0 2016-07-27 CRAN (R 3.4.2)
#> compiler 3.4.3 2017-11-30 local
#> curl 3.1 2017-12-12 CRAN (R 3.4.3)
#> data.table 1.10.4-3 2017-10-27 CRAN (R 3.4.2)
#> datasets * 3.4.3 2017-11-30 local
#> devtools 1.13.4 2017-11-09 CRAN (R 3.4.3)
#> digest 0.6.14 2018-01-14 CRAN (R 3.4.3)
#> dplyr * 0.7.4 2017-09-28 CRAN (R 3.4.2)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.2)
#> forcats 0.2.0 2017-01-23 CRAN (R 3.4.2)
#> foreign 0.8-69 2017-06-22 CRAN (R 3.4.3)
#> glue 1.2.0 2017-10-29 CRAN (R 3.4.2)
#> graphics * 3.4.3 2017-11-30 local
#> grDevices * 3.4.3 2017-11-30 local
#> haven 1.1.1 2018-01-18 CRAN (R 3.4.3)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.2)
#> knitr 1.18 2017-12-27 CRAN (R 3.4.3)
#> magrittr 1.5 2014-11-22 CRAN (R 3.4.2)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.4.3)
#> methods * 3.4.3 2017-11-30 local
#> openxlsx 4.0.17 2017-03-23 CRAN (R 3.4.2)
#> pillar 1.1.0 2018-01-14 CRAN (R 3.4.3)
#> pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.2)
#> R6 2.2.2 2017-06-17 CRAN (R 3.4.2)
#> Rcpp 0.12.14 2017-11-23 CRAN (R 3.4.3)
#> readxl 1.0.0 2017-04-18 CRAN (R 3.4.2)
#> rio * 0.5.5 2017-06-18 CRAN (R 3.4.2)
#> rlang 0.1.6 2017-12-21 CRAN (R 3.4.3)
#> rmarkdown 1.8 2017-11-17 CRAN (R 3.4.2)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
#> stats * 3.4.3 2017-11-30 local
#> stringi 1.1.6 2017-11-17 CRAN (R 3.4.2)
#> stringr 1.2.0 2017-02-18 CRAN (R 3.4.2)
#> tibble 1.4.1 2017-12-25 CRAN (R 3.4.3)
#> tools 3.4.3 2017-11-30 local
#> utils * 3.4.3 2017-11-30 local
#> withr 2.1.1 2017-12-19 CRAN (R 3.4.3)
#> yaml 2.1.16 2017-12-12 CRAN (R 3.4.3)
```
</details>
…On Fri 19 Jan 2018 at 16:01, Kirill Müller ***@***.***> wrote:
Just to double-check, can you please run the following code and paste the results from the clipboard:
reprex::reprex(si = TRUE, {
library(dplyr)
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
test2 %>%
summarize_at("russia", mean, na.rm = TRUE)
})
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks. I'm completely at a loss here. Need to compare package versions to check if they are different. |
Here is that comparison: diff -u /tmp/his.txt /tmp/mine.txt
--- /tmp/his.txt 2018-01-19 10:50:38.575592738 -0600
+++ /tmp/mine.txt 2018-01-19 10:50:35.832291330 -0600
@@ -4,9 +4,9 @@
#> version R version 3.4.3 (2017-11-30)
#> system x86_64, linux-gnu
#> ui X11
-#> language en_US
+#> language (EN)
#> collate en_US.UTF-8
-#> tz Europe/Busingen
+#> tz America/Chicago
#> date 2018-01-19
#> Packages -----------------------------------------------------------------
#> package * version
@@ -21,25 +21,25 @@
#> data.table 1.10.4-3
#> datasets * 3.4.3
#> devtools 1.13.4
-#> digest 0.6.13
+#> digest 0.6.14
#> dplyr * 0.7.4
#> evaluate 0.10.1
-#> forcats 0.2.0.9000
+#> forcats 0.2.0
#> foreign 0.8-69
-#> glue 1.2.0.9000
+#> glue 1.2.0
#> graphics * 3.4.3
#> grDevices * 3.4.3
-#> haven 1.1.0
+#> haven 1.1.1
#> htmltools 0.3.6
#> knitr 1.18
#> magrittr 1.5
#> memoise 1.1.0
#> methods * 3.4.3
-#> openxlsx 4.0.29
-#> pillar 1.0.99.9001
+#> openxlsx 4.0.17
+#> pillar 1.1.0
#> pkgconfig 2.0.1
#> R6 2.2.2
-#> Rcpp 0.12.14.5
+#> Rcpp 0.12.14
#> readxl 1.0.0
#> rio * 0.5.5
#> rlang 0.1.6
@@ -48,9 +48,8 @@
#> stats * 3.4.3
#> stringi 1.1.6
#> stringr 1.2.0
-#> tibble 1.4.1.9000
+#> tibble 1.4.1
#> tools 3.4.3
#> utils * 3.4.3
-#> withr 2.1.1.9000
+#> withr 2.1.1
#> yaml 2.1.16
-#> zip 1.0.0
Diff finished. Fri Jan 19 10:50:43 2018
|
I just re-ran this with the latest development version of all the packages where we had differnt versions with the exception of Rcpp (couldn't find 0.12.14.5) and digest and haven (where my package version was more recent than yours), and openxlsx (it's not an excel file) with the same results. I was also able to reproduce this on a macOS computer with up-to-date packages from CRAN. |
Very strange indeed. This is what I just tried, with r-lib/withr#66, in a vanilla session: withr::with_temp_libpaths(action = "replace", {
install.packages(c("dplyr", "rio"))
library(dplyr)
library(rio)
test2 <- import("https://github.com/tidyverse/dplyr/files/1105096/test2.zip")
test2 %>%
summarize_at("russia", mean, na.rm = TRUE)
}) I got:
|
OK, if I do that I also get But I still get NA in a "normal" R session, even after making sure all R packages are up-to-date and deleting my Rprofile file |
Maybe a hidden install-time dependency? Can you post a snapshot of your library, and your OS and version? I wonder if I can replicate the problem on my machine. The only other option I see would be undefined behavior. |
How do you suggest I do that? I tried using |
Just zip your |
D'oh! Warning, large download (~56M): https://www.dropbox.com/s/oh2awr5fiby3759/r.tar.gz?dl=0 R version 3.4.3 (2017-11-30) Matrix products: default |
Mounted your library into a Docker container built with the following
Got So:
I'm going to reinstall all packages already present in your library, one by one, and check which package fixes the problem. Will adapt the Dockerfile to take a copy from your precious library first. |
So, reinstalling dplyr appears to resolve the problem. Can you please post the output of |
And also on the OS X system, if you can? |
Warning message: And indeed reinstalling dplyr fixes the issue! I don't have access to the mac right now, but I'll take a look later. Thanks for looking through this super particular bug! |
Thank you for furnishing me with the input I asked for! Leaving it open for now, will file an issue with Rcpp that points here. |
This problem disappears if dplyr is compiled against Rcpp >= 0.12.15. Just installing Rcpp 0.12.15 is not enough. In the following
The problem here is fixed with RcppCore/Rcpp#790, which is included in Rcpp 0.12.15. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
I'm working with a data.frame and dplyr returns
NA
for all summaries for this variable.Here's the data (from the General Social Survey). Sorry for the zip file, github won't let me upload the file directly.
test2.zip
and the R code. Note that you can change the summarize statement to anything (e.g.
summarize(m=mean(russia, na.rm=TRUE))
and it'll still return NA:The data aren't crazy (and not all the values for "russia" are missing):
Am I missing something really simple here?
The text was updated successfully, but these errors were encountered: