-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6640][Core] Fix the race condition of creating HeartbeatReceiver and retrieving HeartbeatReceiver #5306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #29525 has started for PR 5306 at commit |
|
Test build #29525 has finished for PR 5306 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks weird here. I think it should be if (scheduler != null)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. It's a typo. Thank you for pointing it out.
|
Test build #29546 has started for PR 5306 at commit |
|
Test build #29546 has finished for PR 5306 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the need for reregistering vs. just ignoring the heartbeat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reregistering: Because Executor uses askWithReply to send Heartbeat, I want to send something back to Executor to avoid timeout.
|
@zsxwing this looks fine, though a little funky because we send |
We can get TaskScheduler from SparkContext since HeartbeatReceiver has a pointer to SparkContext. I updated the codes to eliminate TaskScheduler from the message. |
|
Test build #29677 has started for PR 5306 at commit |
|
Test build #29677 has finished for PR 5306 at commit
|
|
Test PASSed. |
|
LGTM merging this into master thanks. The new solution looks much better! |
|
Should this be in 1.3.2? Seems like people are hitting it? |
This PR moved the code of creating
HeartbeatReceiverabove the code of creatingschedulerBackendto resolve the race condition.