[SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith#18462
[SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith#18462coreywoodfield wants to merge 4 commits intoapache:masterfrom
Conversation
|
how about checking if we have tests for these two types (as not supported)? |
|
Yes. We should also capture it and throw an exception. For example, in line 1009, you can do something like if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) {
throw new AnalysisException("XYZ")
}Then, you can add a test case in DatasetSuite.scala |
|
I added the check as well as tests to make sure exceptions are thrown in the correct cases. If you want me to make the test more extensive (e.g. test that exceptions are not thrown when valid join types are passed in, etc.) I'd be happy to, just let me know |
| val ds1 = Seq(1, 2, 3).toDS().as("a") | ||
| val ds2 = Seq(1, 2).toDS().as("b") | ||
|
|
||
| intercept[AnalysisException] { |
There was a problem hiding this comment.
Yeah. Please check the output message.
val e = intercept[AnalysisException] {
...
}.getMessages
assert(e.contains("xyz"))
|
Please create a JIRA in https://issues.apache.org/jira/projects/SPARK/summary and put the JIRA number in your PR title. You can follow http://spark.apache.org/contributing.html |
|
I added a check on the output message and created a JIRA. I think that should cover everything. Thanks for all your guidance in this |
| Some(condition.expr))).analyzed.asInstanceOf[Join] | ||
|
|
||
| if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) { | ||
| throw new AnalysisException("Invalid join type in joinWith: " + joined.joinType) |
Also made tests more robust and less likely to break if changes are made to joinTypes
|
retest this please |
|
ok to test |
|
Test build #79354 has finished for PR 18462 at commit
|
|
@gatorsmile Have you had a chance to look at this since the tests passed? Is there anything else that needs to be done? |
|
sorry, I forgot it. |
|
retest this please |
|
LGTM |
|
Test build #79742 has finished for PR 18462 at commit
|
|
retest this please |
|
Test build #79767 has finished for PR 18462 at commit
|
…#joinWith
## What changes were proposed in this pull request?
Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both
## How was this patch tested?
I ran the following code :
```
public static void main(String[] args) {
SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test"));
Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();}
}
```
which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.
Author: Corey Woodfield <coreywoodfield@gmail.com>
Closes #18462 from coreywoodfield/master.
(cherry picked from commit 8cd9cdf)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
|
Thanks! Merging to master/2.2 |
…#joinWith
## What changes were proposed in this pull request?
Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both
## How was this patch tested?
I ran the following code :
```
public static void main(String[] args) {
SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test"));
Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();}
try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();}
}
```
which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.
Author: Corey Woodfield <coreywoodfield@gmail.com>
Closes apache#18462 from coreywoodfield/master.
(cherry picked from commit 8cd9cdf)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
What changes were proposed in this pull request?
Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset<Tuple2>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both
How was this patch tested?
I ran the following code :
which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.