Skip to content

Conversation

@nezihyigitbasi
Copy link
Contributor

@nezihyigitbasi nezihyigitbasi commented Mar 21, 2016

What changes were proposed in this pull request?

This PR adds support for specifying an optional custom coalescer to the coalesce() method. Currently I have only added this feature to the RDD interface, and once we sort out the details we can proceed with adding this feature to the other APIs (Dataset etc.)

How was this patch tested?

Added a unit test for this functionality.

/cc @rxin (per our discussion on the mailing list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since HadoopPartition is not public a user who wants to implement this outside of Spark can have some trouble.

@nezihyigitbasi
Copy link
Contributor Author

@rxin any comments?

@hbhanawat
Copy link

@nezihyigitbasi, do you plan to add something similar for DF/DS API?

@nezihyigitbasi
Copy link
Contributor Author

@hbhanawat once we figure out the details I think it makes sense to do that.

@nezihyigitbasi
Copy link
Contributor Author

@rxin any plans to review this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need a val here do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no we don't.

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

The API change looks alright. I'd separate the dataset changes from this one. Are there other things you want to do before this is not WIP?

@nezihyigitbasi
Copy link
Contributor Author

If the API changes look OK to you, then I don't have anything else before this is not WIP. I only need to resolve conflicts with the master.

@nezihyigitbasi nezihyigitbasi force-pushed the custom_coalesce_policy branch from adc12e6 to 016a896 Compare April 18, 2016 21:30
@nezihyigitbasi
Copy link
Contributor Author

@rxin rebased & addressed comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have parentheses since it has side effect

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

Also can you tag all these apis as DeveloperApi? Thanks.'

@nezihyigitbasi nezihyigitbasi force-pushed the custom_coalesce_policy branch from 016a896 to 9afeaa0 Compare April 18, 2016 22:29
Copy link
Contributor Author

@nezihyigitbasi nezihyigitbasi Apr 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the naming convention for other classes, let me know if you still want lower-case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea actually many of the naming in spark core is wrong but we never bothered changing them. Usually SomeWord.scala means there is a class named SomeWord. The scala style guide actually recommends when there are multiple classes that are part of a coherent group, start with lowercase (similar to a lot of c++ naming guides).

@nezihyigitbasi nezihyigitbasi force-pushed the custom_coalesce_policy branch from 9afeaa0 to c61ab42 Compare April 18, 2016 22:33
@nezihyigitbasi
Copy link
Contributor Author

@rxin thanks for the comments. Updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to add the label here. something like ::DeveloperApi. look up other classes to confirm.

@rxin
Copy link
Contributor

rxin commented Apr 18, 2016

LGTM other than that couple minor feedback.

@nezihyigitbasi nezihyigitbasi force-pushed the custom_coalesce_policy branch from c61ab42 to 9d91f77 Compare April 19, 2016 00:12
@nezihyigitbasi
Copy link
Contributor Author

@rxin thanks, comments addressed. Renamed that file to use lower-case too.

@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

Thanks - let's wait for Jenkins. Can you update the title / description of the pull request?

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #2818 has finished for PR 11865 at commit 9d91f77.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait PartitionCoalescer
    • class PartitionGroup(val prefLoc: Option[String] = None)

@nezihyigitbasi nezihyigitbasi changed the title [SPARK-14042][CORE] Add custom coalescer support [WIP] [SPARK-14042][CORE] Add custom coalescer support Apr 19, 2016
@nezihyigitbasi
Copy link
Contributor Author

Mima tests failing, I guess we can exclude them all. What do you think?

[info] spark-core: found 3 potential binary incompatibilities while checking against org.apache.spark:spark-core_2.11:1.6.0  (filtered 1299)
[error]  * method coalesce(Int,Boolean,scala.math.Ordering)org.apache.spark.rdd.RDD in class org.apache.spark.rdd.RDD does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.rdd.RDD.coalesce")
[error]  * class org.apache.spark.rdd.PartitionCoalescer#LocationIterator does not have a correspondent in current version
[error]    filter with: ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.rdd.PartitionCoalescer$LocationIterator")
[error]  * declaration of class org.apache.spark.rdd.PartitionCoalescer is interface org.apache.spark.rdd.PartitionCoalescer in current version; changing class to interface breaks client code
[error]    filter with: ProblemFilters.exclude[IncompatibleTemplateDefProblem]("org.apache.spark.rdd.PartitionCoalescer")

@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

Yup go for it.

@nezihyigitbasi nezihyigitbasi force-pushed the custom_coalesce_policy branch from 9d91f77 to 5a12586 Compare April 19, 2016 03:30
@nezihyigitbasi
Copy link
Contributor Author

@rxin somehow jenkins didn't start the tests after my last push, can you please kick it off?

@SparkQA
Copy link

SparkQA commented Apr 19, 2016

Test build #2829 has finished for PR 11865 at commit 5a12586.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait PartitionCoalescer
    • class PartitionGroup(val prefLoc: Option[String] = None)

@nezihyigitbasi
Copy link
Contributor Author

@rxin tests look OK, do you have any other comments?

@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

Merging in master. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants