-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-31518][CORE] Expose filterByRange in JavaPairRDD #28293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Jenkins test this please |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's minor but legitimate. We have separate Java-friendly methods that take a Comparator for similar reasons.
| * performed efficiently by only scanning the partitions that might containt matching elements. | ||
| * Otherwise, a standard `filter` is applied to all partitions. | ||
| */ | ||
| def filterByRange(lower: K, upper: K): JavaPairRDD[K, V] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want the @Since("3.1.0") annotations on these methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I am also wondering whether it makes sense to backport this in 2.4 by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not a bug fix per se. I wouldn't put it in 3.0 even necessarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for curiosity, of the two since marks, Spark's scala @Since annotation and java @since tag, how to choose them? IMHO, @since tag seems better here to let the version show up in the generated Java API documentation. @dongjoon-hyun @srowen thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the generated Java API doc, you should put @since into the comment.
/**
* Return a RDD containing only the elements in the inclusive range `lower` to `upper`.
* If the RDD has been partitioned using a `RangePartitioner`, then this operation can be
* performed efficiently by only scanning the partitions that might containt matching elements.
* Otherwise, a standard `filter` is applied to all partitions.
*
* @since 3.1.0
*/
For example,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have added that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's right, my mistake, I was thinking of Scala
|
Test build #121630 has finished for PR 28293 at commit
|
| assertEquals(filteredPairs.get(0), new Tuple2<>(0, 5)); | ||
| assertEquals(filteredPairs.get(1), new Tuple2<>(1, 8)); | ||
| assertEquals(filteredPairs.get(2), new Tuple2<>(2, 6)); | ||
| assertEquals(filteredPairs.get(3), new Tuple2<>(3, 9)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sbt.ForkMain$ForkError: java.lang.AssertionError: expected:<(3,8)> but was:<(3,9)>
at org.junit.Assert.fail(Assert.java:88)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, @wetneb . Could you switch the parameters of assertEquals? The first parameter should be expected and the second parameter is actual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, sorry about this! I thought I had managed to run the test, but I obviously need to improve my set up. Many apologies!
|
Retest this please. |
|
Test build #121640 has finished for PR 28293 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you all
Merged to master for Apache Spark 3.1.0.
|
@wetneb . You are added to Apache Spark contributor group and SPARK-31518 is assigned to you. Thank you again, @wetneb ! |
What changes were proposed in this pull request?
This exposes the
filterByRangemethod fromOrderedRDDFunctionsin the Java API (as a method of JavaPairRDD).This is the only method of
OrderedRDDFunctionswhich is not exposed in the Java API so far.Why are the changes needed?
This improves the consistency between the Scala and Java APIs. Calling the Scala method manually from a Java context is cumbersome as it requires passing many ClassTags.
Does this PR introduce any user-facing change?
Yes, a new method in the Java API.
How was this patch tested?
With unit tests. The implementation of the Scala method is already tested independently and it was not touched in this PR.
Suggesting @srowen as a reviewer.