-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6331] Load new master URL if present when recovering streaming context from checkpoint #5024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@harishreedharan Can you take a look? |
|
Test build #28604 has started for PR 5024 at commit
|
|
Test build #28605 has started for PR 5024 at commit
|
|
Test build #28604 has finished for PR 5024 at commit
|
|
Test PASSed. |
|
Test build #28605 has finished for PR 5024 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case _ => should be enough, no? Is there a need to return None? (Since None is an object, I am not sure what the real cost is in this case for returning something - so this can be ignored if the cost here is ~zero)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I can make it better by using foreach on option. That was stupid and
too hurried.
On Mar 14, 2015 12:05 AM, "Hari Shreedharan" [email protected]
wrote:
In streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
#5024 (comment):.remove("spark.driver.host") .remove("spark.driver.port")
- new SparkConf(loadDefaults = true).getOption("spark.master") match {
case Some(newMaster) => newSparkConf.setMaster(newMaster)case _ => Nonecase _ => should be enough, no? Is there a need to return None? (Since
None is an object, I am not sure what the real cost is in this case for
returning something - so this can be ignored if the cost here is ~zero)—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/5024/files#r26436080.
|
LGTM. |
|
LGTM. A more general thinking maybe not relevant to this PR, if some configurations are changed after resubmitting the application, how to handle this, to choose the new configuration or still keep the old one, like memory size or core number. |
|
Test build #28666 has started for PR 5024 at commit
|
|
Test build #28666 has finished for PR 5024 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being called oldMaster? Isn't this an option wrapping the new master?
|
Test build #28674 has started for PR 5024 at commit
|
|
Test build #28674 has finished for PR 5024 at commit
|
|
Test PASSed. |
… context from checkpoint In streaming driver recovery, when the SparkConf is reconstructed based on the checkpointed configuration, it recovers the old master URL. This okay if the cluster on which the streaming application is relaunched is the same cluster as it was running before. But if that cluster changes, there is no way to inject the new master URL of the new cluster. As a result, the restarted app tries to connect to the non-existent old cluster and fails. The solution is to check whether a master URL is set in the System properties (by Spark submit) before recreating the SparkConf. If a new master url is set in the properties, then use it as that is obviously the most relevant one. Otherwise load the old one (to maintain existing behavior). Author: Tathagata Das <[email protected]> Closes #5024 from tdas/SPARK-6331 and squashes the following commits: 392fd44 [Tathagata Das] Fixed naming issue. c7c0b99 [Tathagata Das] Addressed comments. 6a0857c [Tathagata Das] Updated testsuites. 222485d [Tathagata Das] Load new master URL if present when recovering streaming context from checkpoint (cherry picked from commit c928796) Signed-off-by: Tathagata Das <[email protected]>
In streaming driver recovery, when the SparkConf is reconstructed based on the checkpointed configuration, it recovers the old master URL. This okay if the cluster on which the streaming application is relaunched is the same cluster as it was running before. But if that cluster changes, there is no way to inject the new master URL of the new cluster. As a result, the restarted app tries to connect to the non-existent old cluster and fails.
The solution is to check whether a master URL is set in the System properties (by Spark submit) before recreating the SparkConf. If a new master url is set in the properties, then use it as that is obviously the most relevant one. Otherwise load the old one (to maintain existing behavior).