Skip to content

[SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith#18462

Closed
coreywoodfield wants to merge 4 commits intoapache:masterfrom
coreywoodfield:master
Closed

[SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith#18462
coreywoodfield wants to merge 4 commits intoapache:masterfrom
coreywoodfield:master

Conversation

@coreywoodfield
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset<Tuple2>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both

How was this patch tested?

I ran the following code :

public static void main(String[] args) {
	SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test"));
	Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
	Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);
		
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();}
}

which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.

@coreywoodfield coreywoodfield changed the title Removed invalid joinTypes from javadoc of Dataset#joinWith [Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith Jun 29, 2017
@felixcheung
Copy link
Copy Markdown
Member

how about checking if we have tests for these two types (as not supported)?

@gatorsmile
Copy link
Copy Markdown
Member

Yes. We should also capture it and throw an exception. For example, in line 1009, you can do something like

    if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) {
      throw new AnalysisException("XYZ")
    }

Then, you can add a test case in DatasetSuite.scala

@coreywoodfield
Copy link
Copy Markdown
Contributor Author

I added the check as well as tests to make sure exceptions are thrown in the correct cases. If you want me to make the test more extensive (e.g. test that exceptions are not thrown when valid join types are passed in, etc.) I'd be happy to, just let me know

val ds1 = Seq(1, 2, 3).toDS().as("a")
val ds2 = Seq(1, 2).toDS().as("b")

intercept[AnalysisException] {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Please check the output message.

val e = intercept[AnalysisException] {
  ...
}.getMessages
assert(e.contains("xyz"))

@gatorsmile
Copy link
Copy Markdown
Member

Please create a JIRA in https://issues.apache.org/jira/projects/SPARK/summary and put the JIRA number in your PR title. You can follow http://spark.apache.org/contributing.html

@coreywoodfield coreywoodfield changed the title [Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith [SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith Jul 7, 2017
@coreywoodfield
Copy link
Copy Markdown
Contributor Author

I added a check on the output message and created a JIRA. I think that should cover everything. Thanks for all your guidance in this

Some(condition.expr))).analyzed.asInstanceOf[Join]

if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) {
throw new AnalysisException("Invalid join type in joinWith: " + joined.joinType)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: joined.joinType.sql?

Also made tests more robust and less likely to break if changes are made to joinTypes
@gatorsmile
Copy link
Copy Markdown
Member

retest this please

@gatorsmile
Copy link
Copy Markdown
Member

ok to test

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 8, 2017

Test build #79354 has finished for PR 18462 at commit 9aee54a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@coreywoodfield
Copy link
Copy Markdown
Contributor Author

coreywoodfield commented Jul 19, 2017

@gatorsmile Have you had a chance to look at this since the tests passed? Is there anything else that needs to be done?

@gatorsmile
Copy link
Copy Markdown
Member

sorry, I forgot it.

@gatorsmile
Copy link
Copy Markdown
Member

retest this please

@gatorsmile
Copy link
Copy Markdown
Member

LGTM

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 19, 2017

Test build #79742 has finished for PR 18462 at commit 9aee54a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Copy Markdown
Member

retest this please

@SparkQA
Copy link
Copy Markdown

SparkQA commented Jul 19, 2017

Test build #79767 has finished for PR 18462 at commit 9aee54a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jul 19, 2017
…#joinWith

## What changes were proposed in this pull request?

Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both

## How was this patch tested?

I ran the following code :
```
public static void main(String[] args) {
	SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test"));
	Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
	Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);

	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();}
}
```
which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.

Author: Corey Woodfield <coreywoodfield@gmail.com>

Closes #18462 from coreywoodfield/master.

(cherry picked from commit 8cd9cdf)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
@gatorsmile
Copy link
Copy Markdown
Member

Thanks! Merging to master/2.2

@asfgit asfgit closed this in 8cd9cdf Jul 19, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…#joinWith

## What changes were proposed in this pull request?

Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both

## How was this patch tested?

I ran the following code :
```
public static void main(String[] args) {
	SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test"));
	Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
	Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);

	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();}
}
```
which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.

Author: Corey Woodfield <coreywoodfield@gmail.com>

Closes apache#18462 from coreywoodfield/master.

(cherry picked from commit 8cd9cdf)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants