Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Apr 1, 2015

This PR moved the code of creating HeartbeatReceiver above the code of creating schedulerBackend to resolve the race condition.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29525 has started for PR 5306 at commit d7c250d.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29525 has finished for PR 5306 at commit d7c250d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29525/
Test PASSed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks weird here. I think it should be if (scheduler != null)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It's a typo. Thank you for pointing it out.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29546 has started for PR 5306 at commit dd202c7.

@SparkQA
Copy link

SparkQA commented Apr 1, 2015

Test build #29546 has finished for PR 5306 at commit dd202c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29546/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the need for reregistering vs. just ignoring the heartbeat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reregistering: Because Executor uses askWithReply to send Heartbeat, I want to send something back to Executor to avoid timeout.

@andrewor14
Copy link
Contributor

@zsxwing this looks fine, though a little funky because we send TaskScheduler and all of its pointers through Akka, though I don't see another simpler alternative. In the past I've seen Akka silently drop messages that contain objects that are too big. Here we log a warning if that happens so I suppose it will be fine. I will merge this once you address the minor comments.

@zsxwing
Copy link
Member Author

zsxwing commented Apr 3, 2015

@zsxwing this looks fine, though a little funky because we send TaskScheduler and all of its pointers through Akka, though I don't see another simpler alternative.

We can get TaskScheduler from SparkContext since HeartbeatReceiver has a pointer to SparkContext. I updated the codes to eliminate TaskScheduler from the message.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29677 has started for PR 5306 at commit 840399d.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29677 has finished for PR 5306 at commit 840399d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29677/
Test PASSed.

@andrewor14
Copy link
Contributor

LGTM merging this into master thanks. The new solution looks much better!

@asfgit asfgit closed this in 88504b7 Apr 3, 2015
@zsxwing zsxwing deleted the SPARK-6640 branch April 5, 2015 07:00
@rxin
Copy link
Contributor

rxin commented Apr 6, 2015

Should this be in 1.3.2? Seems like people are hitting it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants