[SPARK-31518][CORE] Expose filterByRange in JavaPairRDD #28293

wetneb · 2020-04-22T09:40:05Z

What changes were proposed in this pull request?

This exposes the filterByRange method from OrderedRDDFunctions in the Java API (as a method of JavaPairRDD).

This is the only method of OrderedRDDFunctions which is not exposed in the Java API so far.

Why are the changes needed?

This improves the consistency between the Scala and Java APIs. Calling the Scala method manually from a Java context is cumbersome as it requires passing many ClassTags.

Does this PR introduce any user-facing change?

Yes, a new method in the Java API.

How was this patch tested?

With unit tests. The implementation of the Scala method is already tested independently and it was not touched in this PR.

Suggesting @srowen as a reviewer.

srowen · 2020-04-22T14:56:59Z

Jenkins test this please

srowen

Yeah it's minor but legitimate. We have separate Java-friendly methods that take a Comparator for similar reasons.

srowen · 2020-04-22T14:57:57Z

core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala

+   * performed efficiently by only scanning the partitions that might containt matching elements.
+   * Otherwise, a standard `filter` is applied to all partitions.
+   */
+  def filterByRange(lower: K, upper: K): JavaPairRDD[K, V] = {


You'll want the @Since("3.1.0") annotations on these methods

Sure. I am also wondering whether it makes sense to backport this in 2.4 by the way?

No, it's not a bug fix per se. I wouldn't put it in 3.0 even necessarily.

Thank you for your first contribution, @wetneb ! +1 for @srowen 's answers.

Just for curiosity, of the two since marks, Spark's scala @Since annotation and java @since tag, how to choose them? IMHO, @since tag seems better here to let the version show up in the generated Java API documentation. @dongjoon-hyun @srowen thanks.

For the generated Java API doc, you should put @since into the comment.

/** * Return a RDD containing only the elements in the inclusive range `lower` to `upper`. * If the RDD has been partitioned using a `RangePartitioner`, then this operation can be * performed efficiently by only scanning the partitions that might containt matching elements. * Otherwise, a standard `filter` is applied to all partitions. * * @since 3.1.0 */

For example,

https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc1-docs/_site/api/java/org/apache/spark/sql/DataFrameNaFunctions.html

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala#L33

Thanks, I have added that too.

Yep that's right, my mistake, I was thinking of Scala

SparkQA · 2020-04-22T17:53:45Z

Test build #121630 has finished for PR 28293 at commit b74661d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-04-22T19:10:03Z

core/src/test/java/test/org/apache/spark/JavaAPISuite.java

+    assertEquals(filteredPairs.get(0), new Tuple2<>(0, 5));
+    assertEquals(filteredPairs.get(1), new Tuple2<>(1, 8));
+    assertEquals(filteredPairs.get(2), new Tuple2<>(2, 6));
+    assertEquals(filteredPairs.get(3), new Tuple2<>(3, 9));


sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<(3,8)> but was:<(3,9)> at org.junit.Assert.fail(Assert.java:88)

BTW, @wetneb . Could you switch the parameters of assertEquals? The first parameter should be expected and the second parameter is actual.

Oops, sorry about this! I thought I had managed to run the test, but I obviously need to improve my set up. Many apologies!

dongjoon-hyun · 2020-04-22T22:31:55Z

Retest this please.

SparkQA · 2020-04-23T01:01:10Z

Test build #121640 has finished for PR 28293 at commit 62dd41c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM. Thank you all
Merged to master for Apache Spark 3.1.0.

dongjoon-hyun · 2020-04-23T03:06:48Z

@wetneb . You are added to Apache Spark contributor group and SPARK-31518 is assigned to you. Thank you again, @wetneb !

SPARK-31518: Expose filterByRange in JavaPairRDD.

b74661d

probot-autolabeler bot added the CORE label Apr 22, 2020

wetneb changed the title ~~SPARK-31518: Expose filterByRange in JavaPairRDD.~~ [SPARK-31518][CORE] Expose filterByRange in JavaPairRDD Apr 22, 2020

srowen reviewed Apr 22, 2020

View reviewed changes

wetneb added 3 commits April 22, 2020 17:02

Add @SInCE("3.1.0") annotations

0f12ba9

Add missing import

c28ceeb

Add since annotation for javadocs

be74e8a

dongjoon-hyun reviewed Apr 22, 2020

View reviewed changes

Fix unit test

62dd41c

dongjoon-hyun approved these changes Apr 23, 2020

View reviewed changes

dongjoon-hyun closed this in 4970249 Apr 23, 2020

wetneb deleted the SPARK-31518 branch April 23, 2020 06:18

[SPARK-31518][CORE] Expose filterByRange in JavaPairRDD #28293

[SPARK-31518][CORE] Expose filterByRange in JavaPairRDD #28293

Uh oh!

Conversation

wetneb commented Apr 22, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

srowen commented Apr 22, 2020

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Apr 22, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Apr 22, 2020

Uh oh!

SparkQA commented Apr 23, 2020

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Apr 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yaooqinn Apr 22, 2020 •

edited

Loading

dongjoon-hyun Apr 22, 2020 •

edited

Loading

dongjoon-hyun Apr 22, 2020 •

edited

Loading