[SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith by coreywoodfield · Pull Request #18462 · apache/spark

coreywoodfield · 2017-06-29T00:18:31Z

What changes were proposed in this pull request?

Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset<Tuple2>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both

How was this patch tested?

I ran the following code :

public static void main(String[] args) {
	SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test"));
	Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class);
	Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class);
		
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();}
	try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();}
}

which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x.

felixcheung · 2017-07-01T04:48:06Z

how about checking if we have tests for these two types (as not supported)?

gatorsmile · 2017-07-03T17:03:21Z

Yes. We should also capture it and throw an exception. For example, in line 1009, you can do something like

    if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) {
      throw new AnalysisException("XYZ")
    }

Then, you can add a test case in DatasetSuite.scala

coreywoodfield · 2017-07-04T17:02:50Z

I added the check as well as tests to make sure exceptions are thrown in the correct cases. If you want me to make the test more extensive (e.g. test that exceptions are not thrown when valid join types are passed in, etc.) I'd be happy to, just let me know

gatorsmile · 2017-07-04T17:08:29Z

+    val ds1 = Seq(1, 2, 3).toDS().as("a")
+    val ds2 = Seq(1, 2).toDS().as("b")
+
+    intercept[AnalysisException] {


Yeah. Please check the output message.

val e = intercept[AnalysisException] { ... }.getMessages assert(e.contains("xyz"))

gatorsmile · 2017-07-04T17:10:14Z

Please create a JIRA in https://issues.apache.org/jira/projects/SPARK/summary and put the JIRA number in your PR title. You can follow http://spark.apache.org/contributing.html

coreywoodfield · 2017-07-07T00:18:24Z

I added a check on the output message and created a JIRA. I think that should cover everything. Thanks for all your guidance in this

gatorsmile · 2017-07-07T05:19:07Z

        Some(condition.expr))).analyzed.asInstanceOf[Join]

+    if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) {
+      throw new AnalysisException("Invalid join type in joinWith: " + joined.joinType)


Nit: joined.joinType.sql?

Also made tests more robust and less likely to break if changes are made to joinTypes

gatorsmile · 2017-07-07T23:38:36Z

retest this please

gatorsmile · 2017-07-07T23:38:45Z

ok to test

SparkQA · 2017-07-08T01:55:23Z

Test build #79354 has finished for PR 18462 at commit 9aee54a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

coreywoodfield · 2017-07-19T00:27:50Z

@gatorsmile Have you had a chance to look at this since the tests passed? Is there anything else that needs to be done?

gatorsmile · 2017-07-19T06:08:28Z

sorry, I forgot it.

gatorsmile · 2017-07-19T06:09:12Z

retest this please

gatorsmile · 2017-07-19T06:09:36Z

LGTM

SparkQA · 2017-07-19T07:04:54Z

Test build #79742 has finished for PR 18462 at commit 9aee54a.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-07-19T17:45:40Z

retest this please

SparkQA · 2017-07-19T20:09:24Z

Test build #79767 has finished for PR 18462 at commit 9aee54a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

…#joinWith ## What changes were proposed in this pull request? Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both ## How was this patch tested? I ran the following code : ``` public static void main(String[] args) { SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test")); Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class); Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class); try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();} } ``` which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x. Author: Corey Woodfield <coreywoodfield@gmail.com> Closes #18462 from coreywoodfield/master. (cherry picked from commit 8cd9cdf) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

gatorsmile · 2017-07-19T22:22:18Z

Thanks! Merging to master/2.2

…#joinWith ## What changes were proposed in this pull request? Two invalid join types were mistakenly listed in the javadoc for joinWith, in the Dataset class. I presume these were copied from the javadoc of join, but since joinWith returns a Dataset\<Tuple2\>, left_semi and left_anti are invalid, as they only return values from one of the datasets, instead of from both ## How was this patch tested? I ran the following code : ``` public static void main(String[] args) { SparkSession spark = new SparkSession(new SparkContext("local[*]", "Test")); Dataset<Row> one = spark.createDataFrame(Arrays.asList(new Bean(1), new Bean(2), new Bean(3), new Bean(4), new Bean(5)), Bean.class); Dataset<Row> two = spark.createDataFrame(Arrays.asList(new Bean(4), new Bean(5), new Bean(6), new Bean(7), new Bean(8), new Bean(9)), Bean.class); try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "inner").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "cross").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "full_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "right_outer").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_semi").show();} catch (Exception e) {e.printStackTrace();} try {two.joinWith(one, one.col("x").equalTo(two.col("x")), "left_anti").show();} catch (Exception e) {e.printStackTrace();} } ``` which tests all the different join types, and the last two (left_semi and left_anti) threw exceptions. The same code using join instead of joinWith did fine. The Bean class was just a java bean with a single int field, x. Author: Corey Woodfield <coreywoodfield@gmail.com> Closes apache#18462 from coreywoodfield/master. (cherry picked from commit 8cd9cdf) Signed-off-by: gatorsmile <gatorsmile@gmail.com>

Removed invalid joinTypes from javadoc of Dataset#joinWith

a643ef2

coreywoodfield changed the title ~~Removed invalid joinTypes from javadoc of Dataset#joinWith~~ [Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith Jun 29, 2017

Added check for invalid join types in joinWith and tests

00af861

gatorsmile reviewed Jul 4, 2017

View reviewed changes

coreywoodfield changed the title ~~[Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith~~ [SPARK-21333][Docs] Removed invalid joinTypes from javadoc of Dataset#joinWith Jul 7, 2017

Added check to make sure correct exception was being caught in test

c00daf8

gatorsmile reviewed Jul 7, 2017

View reviewed changes

Analysis exception now contains joinType.sql instead of joinType

9aee54a

Also made tests more robust and less likely to break if changes are made to joinTypes

asfgit closed this in 8cd9cdf Jul 19, 2017

Conversation

coreywoodfield commented Jun 29, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

felixcheung commented Jul 1, 2017

Uh oh!

gatorsmile commented Jul 3, 2017

Uh oh!

coreywoodfield commented Jul 4, 2017

Uh oh!

gatorsmile Jul 4, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jul 4, 2017

Uh oh!

coreywoodfield commented Jul 7, 2017

Uh oh!

gatorsmile Jul 7, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jul 7, 2017

Uh oh!

gatorsmile commented Jul 7, 2017

Uh oh!

SparkQA commented Jul 8, 2017

Uh oh!

coreywoodfield commented Jul 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gatorsmile commented Jul 19, 2017

Uh oh!

gatorsmile commented Jul 19, 2017

Uh oh!

gatorsmile commented Jul 19, 2017

Uh oh!

SparkQA commented Jul 19, 2017

Uh oh!

gatorsmile commented Jul 19, 2017

Uh oh!

SparkQA commented Jul 19, 2017

Uh oh!

gatorsmile commented Jul 19, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

coreywoodfield commented Jul 19, 2017 •

edited

Loading