[SPARK-14042][CORE] Add custom coalescer support #11865

nezihyigitbasi · 2016-03-21T20:26:14Z

What changes were proposed in this pull request?

This PR adds support for specifying an optional custom coalescer to the coalesce() method. Currently I have only added this feature to the RDD interface, and once we sort out the details we can proceed with adding this feature to the other APIs (Dataset etc.)

How was this patch tested?

Added a unit test for this functionality.

/cc @rxin (per our discussion on the mailing list)

nezihyigitbasi · 2016-03-21T20:34:03Z

core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala

Since HadoopPartition is not public a user who wants to implement this outside of Spark can have some trouble.

nezihyigitbasi · 2016-03-31T19:05:17Z

@rxin any comments?

hbhanawat · 2016-04-04T05:43:40Z

@nezihyigitbasi, do you plan to add something similar for DF/DS API?

nezihyigitbasi · 2016-04-04T16:53:52Z

@hbhanawat once we figure out the details I think it makes sense to do that.

nezihyigitbasi · 2016-04-18T20:18:32Z

@rxin any plans to review this?

rxin · 2016-04-18T20:36:08Z

core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala

we don't need a val here do we?

no we don't.

rxin · 2016-04-18T20:37:16Z

The API change looks alright. I'd separate the dataset changes from this one. Are there other things you want to do before this is not WIP?

nezihyigitbasi · 2016-04-18T20:45:59Z

If the API changes look OK to you, then I don't have anything else before this is not WIP. I only need to resolve conflicts with the master.

nezihyigitbasi · 2016-04-18T21:31:57Z

@rxin rebased & addressed comments.

rxin · 2016-04-18T21:35:13Z

core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala

this should have parentheses since it has side effect

rxin · 2016-04-18T21:39:35Z

Also can you tag all these apis as DeveloperApi? Thanks.'

nezihyigitbasi · 2016-04-18T22:30:38Z

core/src/main/scala/org/apache/spark/rdd/CoalescedRDDPublic.scala

I followed the naming convention for other classes, let me know if you still want lower-case.

Yea actually many of the naming in spark core is wrong but we never bothered changing them. Usually SomeWord.scala means there is a class named SomeWord. The scala style guide actually recommends when there are multiple classes that are part of a coherent group, start with lowercase (similar to a lot of c++ naming guides).

nezihyigitbasi · 2016-04-18T22:34:09Z

@rxin thanks for the comments. Updated.

rxin · 2016-04-18T23:24:12Z

core/src/main/scala/org/apache/spark/rdd/CoalescedRDDPublic.scala

We would need to add the label here. something like ::DeveloperApi. look up other classes to confirm.

rxin · 2016-04-18T23:50:04Z

LGTM other than that couple minor feedback.

nezihyigitbasi · 2016-04-19T00:13:38Z

@rxin thanks, comments addressed. Renamed that file to use lower-case too.

rxin · 2016-04-19T00:31:28Z

Thanks - let's wait for Jenkins. Can you update the title / description of the pull request?

SparkQA · 2016-04-19T00:40:14Z

Test build #2818 has finished for PR 11865 at commit 9d91f77.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait PartitionCoalescer
- class PartitionGroup(val prefLoc: Option[String] = None)

nezihyigitbasi · 2016-04-19T02:54:24Z

Mima tests failing, I guess we can exclude them all. What do you think?

[info] spark-core: found 3 potential binary incompatibilities while checking against org.apache.spark:spark-core_2.11:1.6.0  (filtered 1299)
[error]  * method coalesce(Int,Boolean,scala.math.Ordering)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.coalesce")
[error]  * class org.apache.spark.rdd.PartitionCoalescer#LocationIterator does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.rdd.PartitionCoalescer$LocationIterator")
[error]  * declaration of class org.apache.spark.rdd.PartitionCoalescer is interface org.apache.spark.rdd.PartitionCoalescer in current version; changing class to interface breaks client code
[error]    filter with: ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.rdd.PartitionCoalescer")

rxin · 2016-04-19T03:29:22Z

Yup go for it.

nezihyigitbasi · 2016-04-19T16:02:38Z

@rxin somehow jenkins didn't start the tests after my last push, can you please kick it off?

SparkQA · 2016-04-19T19:43:29Z

Test build #2829 has finished for PR 11865 at commit 5a12586.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait PartitionCoalescer
- class PartitionGroup(val prefLoc: Option[String] = None)

nezihyigitbasi · 2016-04-19T20:47:35Z

@rxin tests look OK, do you have any other comments?

rxin · 2016-04-19T21:35:18Z

Merging in master. Thanks.

nezihyigitbasi reviewed Mar 21, 2016
View reviewed changes

rxin reviewed Apr 18, 2016
View reviewed changes

nezihyigitbasi force-pushed the custom_coalesce_policy branch from adc12e6 to 016a896 Compare April 18, 2016 21:30

rxin reviewed Apr 18, 2016
View reviewed changes

core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala Outdated

Copy link

Contributor

rxin Apr 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have parentheses since it has side effect

nezihyigitbasi force-pushed the custom_coalesce_policy branch from 016a896 to 9afeaa0 Compare April 18, 2016 22:29

nezihyigitbasi reviewed Apr 18, 2016
View reviewed changes

nezihyigitbasi force-pushed the custom_coalesce_policy branch from 9afeaa0 to c61ab42 Compare April 18, 2016 22:33

rxin reviewed Apr 18, 2016
View reviewed changes

nezihyigitbasi force-pushed the custom_coalesce_policy branch from c61ab42 to 9d91f77 Compare April 19, 2016 00:12

nezihyigitbasi changed the title ~~[SPARK-14042][CORE] Add custom coalescer support [WIP]~~ [SPARK-14042][CORE] Add custom coalescer support Apr 19, 2016

nezihyigitbasi force-pushed the custom_coalesce_policy branch from 9d91f77 to 5a12586 Compare April 19, 2016 03:30

Add custom coalescer support

5a12586

asfgit closed this in 3c91afe Apr 19, 2016

zzcclp added a commit to zzcclp/spark that referenced this pull request May 4, 2016

[EXT][SPARK-14042][CORE] Add custom coalescer support apache#11865

4a43c7a

mariusvniekerk mentioned this pull request Feb 1, 2017

[SPARK-19426][SQL] Custom coalesce for Dataset #16766

Closed

maropu mentioned this pull request Aug 6, 2017

[SPARK-19426][SQL] Custom coalescer for Dataset #18861

Closed

SubhamSinghal mentioned this pull request May 12, 2024

[SPARK-19426][SQL] Custom coalescer for Dataset #46541

Closed

[SPARK-14042][CORE] Add custom coalescer support #11865

[SPARK-14042][CORE] Add custom coalescer support #11865

Uh oh!

Conversation

nezihyigitbasi commented Mar 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

nezihyigitbasi Mar 21, 2016

Choose a reason for hiding this comment

Uh oh!

nezihyigitbasi commented Mar 31, 2016

Uh oh!

hbhanawat commented Apr 4, 2016

Uh oh!

nezihyigitbasi commented Apr 4, 2016

Uh oh!

nezihyigitbasi commented Apr 18, 2016

Uh oh!

rxin Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

nezihyigitbasi Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 18, 2016

Uh oh!

nezihyigitbasi commented Apr 18, 2016

Uh oh!

nezihyigitbasi commented Apr 18, 2016

Uh oh!

rxin Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 18, 2016

Uh oh!

nezihyigitbasi Apr 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rxin Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

nezihyigitbasi commented Apr 18, 2016

Uh oh!

rxin Apr 18, 2016

Choose a reason for hiding this comment

Uh oh!

rxin commented Apr 18, 2016

Uh oh!

nezihyigitbasi commented Apr 19, 2016

Uh oh!

rxin commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

nezihyigitbasi commented Apr 19, 2016

Uh oh!

rxin commented Apr 19, 2016

Uh oh!

nezihyigitbasi commented Apr 19, 2016

Uh oh!

SparkQA commented Apr 19, 2016

Uh oh!

nezihyigitbasi commented Apr 19, 2016

Uh oh!

rxin commented Apr 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nezihyigitbasi commented Mar 21, 2016 •

edited

Loading

nezihyigitbasi Apr 18, 2016 •

edited

Loading