-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17902][R] Revive stringsAsFactors option for collect() in SparkR #19551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -499,6 +499,12 @@ test_that("create DataFrame with different data types", { | |
| expect_equal(collect(df), data.frame(l, stringsAsFactors = FALSE)) | ||
| }) | ||
|
|
||
| test_that("SPARK-17902: collect() with stringsAsFactors enabled", { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would you please verify that factor orders are identical. I wonder if
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. > # Ordered vs unordered
> or <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Lo", "Med", "Hi"), ordered=TRUE)
> or1 <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Lo", "Med", "Hi"), ordered=FALSE)
> expect_equal(or, or1)
error: `or` not equal to `or1`.
Attributes: < Component “class”: Lengths (2, 1) differ (string compare on first 1) >
Attributes: < Component “class”: 1 string mismatch >> # level order mismatch
> or <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Hi", "Lo", "Med"))
> or1 <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Lo", "Med", "Hi"))
> expect_equal(or, or1)
error: `or` not equal to `or1`.
Attributes: < Component “levels”: 3 string mismatches ># Data order mismatch
> or <- factor(c("Lo", "Hi", "Med", "Med", "Hi"), levels=c("Hi", "Lo", "Med"))
> or1 <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Hi", "Lo", "Med"))
> expect_equal(or, or1)
error: `or` not equal to `or1`.
4 string mismatches> or <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Hi", "Lo", "Med"))
> or1 <- factor(c("Hi", "Med", "Med", "Hi", "Lo"), levels=c("Hi", "Lo", "Med"))
> expect_equal(or, or1)Would this test address your concern?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks!
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW: I think iris data frame all Species values are clustered together. That is why the test is passing (the new factor order ends up being identical to the existing order). |
||
| df <- suppressWarnings(collect(createDataFrame(iris), stringsAsFactors = TRUE)) | ||
| expect_equal(class(iris$Species), class(df$Species)) | ||
| expect_equal(iris$Species, df$Species) | ||
| }) | ||
|
|
||
| test_that("SPARK-17811: can create DataFrame containing NA as date and time", { | ||
| df <- data.frame( | ||
| id = 1:2, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For performance maybe it is better to reverse the order of checks:
is.character(vec) && stringsAsFactorsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks.