SPARK-5270 [CORE] Elegantly check if RDD is empty#4074
SPARK-5270 [CORE] Elegantly check if RDD is empty#4074srowen wants to merge 4 commits intoapache:masterfrom
Conversation
|
Test build #25667 has started for PR 4074 at commit
|
|
(Oh of course, if this looks good I can add this to Java / Python too) |
|
Test build #25667 has finished for PR 4074 at commit
|
|
Test PASSed. |
|
LTGM. What is the use case? is this part of a bigger pr? |
|
This is all there is to it. It's just a convenience method that implements the check efficiently. Given several questions on the list, it seems that people do want to test for an empty RDD and there hasn't been an accepted way to do it that is faster than http://apache-spark-user-list.1001560.n3.nabble.com/Testing-if-an-RDD-is-empty-td1678.html#a1679 |
|
Seems reasonable to have since it's non obvious how to do it - @srowen could you add this in Java and Python? |
|
Test build #25682 has started for PR 4074 at commit
|
|
Test build #25682 has finished for PR 4074 at commit
|
|
Test FAILed. |
|
Jenkins, retest this please. |
|
Test build #25701 has started for PR 4074 at commit
|
|
Test build #25701 has finished for PR 4074 at commit
|
|
Test FAILed. |
There was a problem hiding this comment.
I don't think this tests the case where there are multiple partitions but no data in any of the partitions. Maybe add something like
assert(sc.parallelize(Seq(1,2,3), 3).filter(_ < 0).isEmpty())
There was a problem hiding this comment.
I think the sc.parallelize(Seq[Int]() case actually has multiple partitions but I'll add this too. Also, I'll check the case where the first partition is empty but others aren't.
|
Test build #25730 has started for PR 4074 at commit
|
|
Test build #25730 has finished for PR 4074 at commit
|
|
Test PASSed. |
|
Test build #25731 has started for PR 4074 at commit
|
|
Test build #25731 has finished for PR 4074 at commit
|
|
Test PASSed. |
|
LGTM @srowen - are you still working on it or is it good from your end? Will leave a bit of time for others to comment as well. |
|
@pwendell No more changes from my side. |
|
@srowen Thanks Sean, I committed this with a minor re-word of the title. |
Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know. Author: Sean Owen <sowen@cloudera.com> Closes apache#4074 from srowen/SPARK-5270 and squashes the following commits: 66885b8 [Sean Owen] Add note that JavaRDDLike should not be implemented by user code 2e9b490 [Sean Owen] More tests, and Mima-exclude the new isEmpty method in JavaRDDLike 28395ff [Sean Owen] Add isEmpty to Java, Python 7dd04b7 [Sean Owen] Add efficient RDD.isEmpty()
Pretty minor, but submitted for consideration -- this would at least help people make this check in the most efficient way I know.