Skip to content

Conversation

@ted-yu
Copy link

@ted-yu ted-yu commented May 7, 2015

No description provided.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that mappartitionsWithIndex cleans, and thus anything in that is cleaned. maybe it's an incorrect assumption. cc @andrewor14?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Reynold is correct.
I can update the PR for collect() and undo the change for other methods touched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think collect might've been cleaned in DAGScheduler's runJob. Double check that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that @andrewor14 added some tests for closure cleaning as part of his recent ClosureCleaner patch; we might check whether that test suite covers these methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite does not currently cover these methods because they are deprecated, but maybe we should just add them. (@JoshRosen is referring to ClosureCleanerSuite)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rxin @ted-yu actually even though mapPartitionsWithIndex does clean already, it cleans the whole closure but not the ones used in the closure. In this case, I believe it's actually necessary to clean f here since we won't actually clean it from mapPartitionsWithIndex. For the same reason I believe we also need to clean constructA since it's a closure provided by the user.

@andrewor14
Copy link
Contributor

Hey @ted-yu would you mind adding [SPARK-7237] to the title? It's a convention we use for Spark PRs to track the issues.

@andrewor14
Copy link
Contributor

ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32138 has started for PR 5959 at commit 6c124a9.

@andrewor14
Copy link
Contributor

@ted-yu Also it would be good to add test cases for the particular methods you are testing in ClosureCleanerSuite. There is already a test for this specific issue and it should be fairly straight forward to just add to it. Thanks.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32138 has finished for PR 5959 at commit 6c124a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32138/
Test PASSed.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32150 has started for PR 5959 at commit 6846e40.

@ted-yu ted-yu changed the title Clean function in several RDD methods SPARK-7237 Clean function in several RDD methods May 7, 2015
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32151 has started for PR 5959 at commit 0c8d47e.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the tests will fail because these closures are expected to have return statements in them. Can you model these closures after the other ones in this test?

@andrewor14
Copy link
Contributor

Hi @ted-yu I believe this also needs to cover mapPartitionsWithContext and filterWith

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32151 has finished for PR 5959 at commit 0c8d47e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32151/
Test FAILed.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32150 has finished for PR 5959 at commit 6846e40.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32236 has finished for PR 5959 at commit f6014c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32236/
Test PASSed.

@ted-yu
Copy link
Author

ted-yu commented May 8, 2015

@rxin @andrewor14
Please let me know if there is any review comment which I haven't addressed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this commented out?

@ted-yu
Copy link
Author

ted-yu commented May 8, 2015

From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32195/consoleFull :

[info] - filterWith *** FAILED *** (13 milliseconds)
[info] org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 154.0 failed 1 times, most recent failure: Lost task 1.0 in stage 154.0 (TID 865, localhost): org.apache.spark.SparkException: RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32275 has started for PR 5959 at commit 56d7c92.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ted-yu I investigated this a little and this won't work because you're bringing in the whole SparkContext into the closure. Instead we need to clean the closures outside of mapPartitionsWithIndex as you have done in other methods. There is nothing special about filterWith.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you have done that please uncomment the tests

@ted-yu
Copy link
Author

ted-yu commented May 8, 2015

Should have looked closer :-)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32279 has started for PR 5959 at commit f83d445.

@SparkQA
Copy link

SparkQA commented May 8, 2015

Test build #32275 has finished for PR 5959 at commit 56d7c92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32275/
Test PASSed.

@andrewor14
Copy link
Contributor

Great, merging this into master and 1.4. Thanks @ted-yu.

asfgit pushed a commit that referenced this pull request May 9, 2015
Author: tedyu <[email protected]>

Closes #5959 from ted-yu/master and squashes the following commits:

f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex
56d7c92 [tedyu] Consolidate import of Random
f6014c0 [tedyu] Remove cleaning in RDD#filterWith
36feb6c [tedyu] Try to get correct syntax
55d01eb [tedyu] Try to get correct syntax
c2786df [tedyu] Correct syntax
d92bfcf [tedyu] Correct syntax in test
164d3e4 [tedyu] Correct variable name
8b50d93 [tedyu] Address Andrew's review comments
0c8d47e [tedyu] Add test for mapWith()
6846e40 [tedyu] Add test for flatMapWith()
6c124a9 [tedyu] Clean function in several RDD methods

(cherry picked from commit 54e6fa0)
Signed-off-by: Andrew Or <[email protected]>
@asfgit asfgit closed this in 54e6fa0 May 9, 2015
@SparkQA
Copy link

SparkQA commented May 9, 2015

Test build #32279 has finished for PR 5959 at commit f83d445.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32279/
Test PASSed.

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Author: tedyu <[email protected]>

Closes apache#5959 from ted-yu/master and squashes the following commits:

f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex
56d7c92 [tedyu] Consolidate import of Random
f6014c0 [tedyu] Remove cleaning in RDD#filterWith
36feb6c [tedyu] Try to get correct syntax
55d01eb [tedyu] Try to get correct syntax
c2786df [tedyu] Correct syntax
d92bfcf [tedyu] Correct syntax in test
164d3e4 [tedyu] Correct variable name
8b50d93 [tedyu] Address Andrew's review comments
0c8d47e [tedyu] Add test for mapWith()
6846e40 [tedyu] Add test for flatMapWith()
6c124a9 [tedyu] Clean function in several RDD methods
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Author: tedyu <[email protected]>

Closes apache#5959 from ted-yu/master and squashes the following commits:

f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex
56d7c92 [tedyu] Consolidate import of Random
f6014c0 [tedyu] Remove cleaning in RDD#filterWith
36feb6c [tedyu] Try to get correct syntax
55d01eb [tedyu] Try to get correct syntax
c2786df [tedyu] Correct syntax
d92bfcf [tedyu] Correct syntax in test
164d3e4 [tedyu] Correct variable name
8b50d93 [tedyu] Address Andrew's review comments
0c8d47e [tedyu] Add test for mapWith()
6846e40 [tedyu] Add test for flatMapWith()
6c124a9 [tedyu] Clean function in several RDD methods
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Author: tedyu <[email protected]>

Closes apache#5959 from ted-yu/master and squashes the following commits:

f83d445 [tedyu] Move cleaning outside of mapPartitionsWithIndex
56d7c92 [tedyu] Consolidate import of Random
f6014c0 [tedyu] Remove cleaning in RDD#filterWith
36feb6c [tedyu] Try to get correct syntax
55d01eb [tedyu] Try to get correct syntax
c2786df [tedyu] Correct syntax
d92bfcf [tedyu] Correct syntax in test
164d3e4 [tedyu] Correct variable name
8b50d93 [tedyu] Address Andrew's review comments
0c8d47e [tedyu] Add test for mapWith()
6846e40 [tedyu] Add test for flatMapWith()
6c124a9 [tedyu] Clean function in several RDD methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants