Skip to content

Conversation

@jerryshao
Copy link
Contributor

This patch adds a sort flag into ShuffleDependecy and moves sort into hash shuffle implementation.

Moving sort into shuffle implementation can give space for other shuffle implementations (like sort-based shuffle) to better optimize sort through shuffle.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not take up a lot of memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's true. But I will not change the original implementation, since PR931 will solve this issue.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16113/

@jerryshao
Copy link
Contributor Author

Hi @mateiz, mind taking a look at this PR, thanks a lot.

@mateiz
Copy link
Contributor

mateiz commented Jul 9, 2014

Sorry for the delay, Saisai -- will take a look soon. I've been away traveling this week.

@jerryshao
Copy link
Contributor Author

I'm not sure is that what you want, so I hope you can review it and give me come comments. Thanks a lot.

@mateiz
Copy link
Contributor

mateiz commented Jul 12, 2014

This looks pretty good to me API-wise, but an Option[Boolean] is kind of confusing. Maybe we should have an enumeration called SortOrder and pass in an Option[SortOrder].

Also, when this value is set, please add a check that keyOrdering is also set. Otherwise the user made an error in configuring the ShuffleDependency but we'll just return unsorted data.

We should also wait on #931 to be merged and then base this on that.

@jerryshao
Copy link
Contributor Author

Hi Matei, thanks a lot for your review, I will change the code according to your comments.

@mateiz
Copy link
Contributor

mateiz commented Jul 17, 2014

@jerryshao after looking more at #931, I'd actually like to hold off on merging that the way it's set up, so would you mind updating this now? I can merge this as is (without external sort) and then we can add external sort later.

@jerryshao
Copy link
Contributor Author

Hi Matei, I've updated the code according to your comments, would you please review this change? Thanks a lot.

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA tests have started for PR 1210. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16776/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 17, 2014

QA results for PR 1210:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16776/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be private[spark] or at least @DeveloperApi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually probably private[spark] works for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I will add it.

@mateiz
Copy link
Contributor

mateiz commented Jul 18, 2014

@ScrapCodes @pwendell there's a MIMA error on this that seems spurious: it complains that synthetic method org$apache$spark$rdd$OrderedRDDFunctions$$ordering()scala.math.Ordering in class org.apache.spark.rdd.OrderedRDDFunctions does not have a correspondent in new version. I think this is because we used the private val ordering in a closure in the old code, and don't use it in one now. Is that right? How can we add it to the MIMA excludes?

@jerryshao
Copy link
Contributor Author

Shall we add this

ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.rdd.OrderedRDDFunctions.org$apache$spark$rdd$OrderedRDDFunctions$$ordering")

to the MimaExcludes?

@SparkQA
Copy link

SparkQA commented Jul 18, 2014

QA tests have started for PR 1210. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16801/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 18, 2014

QA results for PR 1210:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16801/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 23, 2014

Sorry for the delay -- yeah, try adding that for now.

@jerryshao
Copy link
Contributor Author

Hi Matei, should I wait until your sort-based shuffle is merged into master branch, so I can change the current in memory sort to external sort?

@SparkQA
Copy link

SparkQA commented Jul 24, 2014

QA tests have started for PR 1210. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17098/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 24, 2014

QA results for PR 1210:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17098/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 24, 2014

QA tests have started for PR 1210. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17108/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 24, 2014

QA results for PR 1210:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17108/consoleFull

@mateiz
Copy link
Contributor

mateiz commented Jul 25, 2014

@jerryshao I'll just merge this as is for now, seems simpler.

@asfgit asfgit closed this in 47b6b38 Jul 25, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This patch adds a sort flag into ShuffleDependecy and moves sort into hash shuffle implementation.

Moving sort into shuffle implementation can give space for other shuffle implementations (like sort-based shuffle) to better optimize sort through shuffle.

Author: jerryshao <[email protected]>

Closes apache#1210 from jerryshao/SPARK-2125 and squashes the following commits:

2feaf7b [jerryshao] revert MimaExcludes
ceddf75 [jerryshao] add MimaExeclude
f674ff4 [jerryshao] Add missing Scope restriction
b9fe0dd [jerryshao] Fix some style issues according to comments
ef6b729 [jerryshao] Change sort flag into Option
3f6eeed [jerryshao] Fix issues related to unit test
2f552a5 [jerryshao] Minor changes about naming and order
c92a281 [jerryshao] Move sort into shuffle implementations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants