Skip to content

Conversation

@wetneb
Copy link
Contributor

@wetneb wetneb commented Apr 22, 2020

What changes were proposed in this pull request?

This exposes the filterByRange method from OrderedRDDFunctions in the Java API (as a method of JavaPairRDD).

This is the only method of OrderedRDDFunctions which is not exposed in the Java API so far.

Why are the changes needed?

This improves the consistency between the Scala and Java APIs. Calling the Scala method manually from a Java context is cumbersome as it requires passing many ClassTags.

Does this PR introduce any user-facing change?

Yes, a new method in the Java API.

How was this patch tested?

With unit tests. The implementation of the Scala method is already tested independently and it was not touched in this PR.

Suggesting @srowen as a reviewer.

@wetneb wetneb changed the title SPARK-31518: Expose filterByRange in JavaPairRDD. [SPARK-31518][CORE] Expose filterByRange in JavaPairRDD Apr 22, 2020
@srowen
Copy link
Member

srowen commented Apr 22, 2020

Jenkins test this please

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's minor but legitimate. We have separate Java-friendly methods that take a Comparator for similar reasons.

* performed efficiently by only scanning the partitions that might containt matching elements.
* Otherwise, a standard `filter` is applied to all partitions.
*/
def filterByRange(lower: K, upper: K): JavaPairRDD[K, V] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want the @Since("3.1.0") annotations on these methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I am also wondering whether it makes sense to backport this in 2.4 by the way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not a bug fix per se. I wouldn't put it in 3.0 even necessarily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your first contribution, @wetneb ! +1 for @srowen 's answers.

Copy link
Member

@yaooqinn yaooqinn Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for curiosity, of the two since marks, Spark's scala @Since annotation and java @since tag, how to choose them? IMHO, @since tag seems better here to let the version show up in the generated Java API documentation. @dongjoon-hyun @srowen thanks.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the generated Java API doc, you should put @since into the comment.

  /**
   * Return a RDD containing only the elements in the inclusive range `lower` to `upper`.
   * If the RDD has been partitioned using a `RangePartitioner`, then this operation can be
   * performed efficiently by only scanning the partitions that might containt matching elements.
   * Otherwise, a standard `filter` is applied to all partitions.
   *
   * @since 3.1.0
   */

For example,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have added that too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's right, my mistake, I was thinking of Scala

@SparkQA
Copy link

SparkQA commented Apr 22, 2020

Test build #121630 has finished for PR 28293 at commit b74661d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assertEquals(filteredPairs.get(0), new Tuple2<>(0, 5));
assertEquals(filteredPairs.get(1), new Tuple2<>(1, 8));
assertEquals(filteredPairs.get(2), new Tuple2<>(2, 6));
assertEquals(filteredPairs.get(3), new Tuple2<>(3, 9));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<(3,8)> but was:<(3,9)>
	at org.junit.Assert.fail(Assert.java:88)

Copy link
Member

@dongjoon-hyun dongjoon-hyun Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, @wetneb . Could you switch the parameters of assertEquals? The first parameter should be expected and the second parameter is actual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry about this! I thought I had managed to run the test, but I obviously need to improve my set up. Many apologies!

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Apr 23, 2020

Test build #121640 has finished for PR 28293 at commit 62dd41c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you all
Merged to master for Apache Spark 3.1.0.

@dongjoon-hyun
Copy link
Member

@wetneb . You are added to Apache Spark contributor group and SPARK-31518 is assigned to you. Thank you again, @wetneb !

@wetneb wetneb deleted the SPARK-31518 branch April 23, 2020 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants