-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14042][CORE] Add custom coalescer support #11865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14042][CORE] Add custom coalescer support #11865
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since HadoopPartition is not public a user who wants to implement this outside of Spark can have some trouble.
|
@rxin any comments? |
|
@nezihyigitbasi, do you plan to add something similar for DF/DS API? |
|
@hbhanawat once we figure out the details I think it makes sense to do that. |
|
@rxin any plans to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need a val here do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no we don't.
|
The API change looks alright. I'd separate the dataset changes from this one. Are there other things you want to do before this is not WIP? |
|
If the API changes look OK to you, then I don't have anything else before this is not WIP. I only need to resolve conflicts with the master. |
adc12e6 to
016a896
Compare
|
@rxin rebased & addressed comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have parentheses since it has side effect
|
Also can you tag all these apis as DeveloperApi? Thanks.' |
016a896 to
9afeaa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the naming convention for other classes, let me know if you still want lower-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea actually many of the naming in spark core is wrong but we never bothered changing them. Usually SomeWord.scala means there is a class named SomeWord. The scala style guide actually recommends when there are multiple classes that are part of a coherent group, start with lowercase (similar to a lot of c++ naming guides).
9afeaa0 to
c61ab42
Compare
|
@rxin thanks for the comments. Updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to add the label here. something like ::DeveloperApi. look up other classes to confirm.
|
LGTM other than that couple minor feedback. |
c61ab42 to
9d91f77
Compare
|
@rxin thanks, comments addressed. Renamed that file to use lower-case too. |
|
Thanks - let's wait for Jenkins. Can you update the title / description of the pull request? |
|
Test build #2818 has finished for PR 11865 at commit
|
|
Mima tests failing, I guess we can exclude them all. What do you think? [info] spark-core: found 3 potential binary incompatibilities while checking against org.apache.spark:spark-core_2.11:1.6.0 (filtered 1299)
[error] * method coalesce(Int,Boolean,scala.math.Ordering)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in current version
[error] filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.coalesce")
[error] * class org.apache.spark.rdd.PartitionCoalescer#LocationIterator does not have a correspondent in current version
[error] filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.rdd.PartitionCoalescer$LocationIterator")
[error] * declaration of class org.apache.spark.rdd.PartitionCoalescer is interface org.apache.spark.rdd.PartitionCoalescer in current version; changing class to interface breaks client code
[error] filter with: ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.rdd.PartitionCoalescer") |
|
Yup go for it. |
9d91f77 to
5a12586
Compare
|
@rxin somehow jenkins didn't start the tests after my last push, can you please kick it off? |
|
Test build #2829 has finished for PR 11865 at commit
|
|
@rxin tests look OK, do you have any other comments? |
|
Merging in master. Thanks. |
What changes were proposed in this pull request?
This PR adds support for specifying an optional custom coalescer to the
coalesce()method. Currently I have only added this feature to theRDDinterface, and once we sort out the details we can proceed with adding this feature to the other APIs (Datasetetc.)How was this patch tested?
Added a unit test for this functionality.
/cc @rxin (per our discussion on the mailing list)