[SPARK-4899][MESOS] Support for Checkpointing on Coarse Grained Mode#17750
[SPARK-4899][MESOS] Support for Checkpointing on Coarse Grained Mode#17750gkc2104 wants to merge 5 commits intoapache:masterfrom
Conversation
|
would be great to have this soon in 2.2.x (maybe even backported to 2.1.x) |
|
IMO we should not enable checkpointing in fine-grained mode. Because with checkpointing enabled, mesos agents would persist all status updates to disk which means great I/O cost because fine-grained mode makes use of mesos status updates to send the task results back to the driver. Also I'm not sure whether it makes sense to set the |
|
Hey @lins05, thanks for taking the time to look into this. Yes, It is true that there is an associated overhead in both modes, that's why the defaults have not been changed. i.e. Default behavior is not to checkpoint. Setting And considering that this is being used in the latest version I guess the Spark Driver does support it. |
The overhead in fine-grained mode would be much heavier than coarse grained mode. For example, with checkpoint enabled, each time you run In contrast, in the coarse grained mode the executor would send the 100MB data to the driver directly without going through mesos agents. The only thing that agents write to disk are small task status messages like TASK_RUNNING/TASK_KILLED which are typically several KBbytes.
The code in your link is the mesos cluster scheduler, which is a mesos framework that launches spark drivers for you, not the mesos scheduler inside the spark driver that launches executors. It has |
Do you then think it would be a viable option to enable it by default on Coarse grained and have it not used in Fine-grained.
This makes sense now, I definitely did not consider this , but this explains it.
Could you expand on this a bit more, I assume we could maintain the state of the tasks similar to how driver state is maintained in I'll start implementing that, if you think we could enable it to reconcileTasks with state. |
SGTM, especially considering fine-grained mode is already deprecated.
I don't think it's an easy task at all, because the spark driver is not designed to recover from crash. The state in the MesosClusterScheduler is pretty simple. It's just a REST server that accepts requests from clients and launches spark drivers on their behalf. And it just need to persist its mesos framework id, because it need to re-register with mesos master with the same framework id if it's restarted. In the current implementation MesosClusterScheduler uses zookeeper as the persist storage. Aside from that, the MesosClusterScheduler has no other stateful information. The spark driver is totally different, because it contains lots of stateful information: the job/stage/task info, executors info, catalog that holds temporary views, to name a few. And all those are kept in the driver's memory and would be lost whenever the driver crashes. So it doesn't make sense to set |
I am looking at solving a problem where an intermittent network partition can result in the driver being killed unnecessarily, and it's possible that adding a failover_timeout will solve that, but I'm still looking into that. |
|
Updated the PR to only include checkpointing on Coarse grained mode. |
| <td><code>spark.mesos.checkpoint</code></td> | ||
| <td>false</td> | ||
| <td> | ||
| If set, agents running tasks started by this framework will write the framework pid, executor pids and status updates to disk. |
There was a problem hiding this comment.
Let's customize this copy a bit for Spark instead of just copying the protobuf docs. e.g. "tasks" should be "executors" and you should remove the part about "this framework", in place of something about Spark in particular.
| sc.conf, | ||
| sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.webUrl)), | ||
| None, | ||
| sc.conf.getOption("spark.mesos.checkpoint").map(_.toBoolean), |
There was a problem hiding this comment.
We're trying to move all config over to https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/config.scala
Please add this there.
| sc.conf.getOption("spark.mesos.driver.webui.url").orElse(sc.ui.map(_.webUrl)), | ||
| Option.empty, | ||
| Option.empty, | ||
| None, |
| <td>false</td> | ||
| <td> | ||
| If set, agents running tasks started by this framework will write the framework pid, executor pids and status updates to disk. | ||
| If set to true, the agents that are running the spark-executors will write framework pids (Spark), executor pids and status updates to disk. |
There was a problem hiding this comment.
nits:
s/spark/Spark
remove the '-'
remove "(Spark)". All of this data applies to Spark, not just the framework pid.
s/pids/pid (there's only one framework)
| .createOptional | ||
|
|
||
| private[spark] val CHECKPOINT = | ||
| ConfigBuilder("spark.mesos.checkpoint") |
|
LGTM @srowen Can we get a merge? Thanks. |
|
Actually, first, @gkc2104 can you please remove "fine-grained mode" from the PR title? |
| <td><code>spark.mesos.checkpoint</code></td> | ||
| <td>false</td> | ||
| <td> | ||
| If set to true, the agents that are running the Spark executors will write the framework pid, executor pids and status updates to disk. |
|
@gkc2104 @mgummelt Will there be a separate issue & pr for adding the failover_timeout? |
|
ping @srowen, i think this PR is ready to merge |
|
Can one of the admins verify this patch? |
|
@srowen @mgummelt @gkc2104 Just curious, why did we not merge this? Or has this feature been addressed already elsewhere? I couldn't find it anywhere in the latest codebase and documentation. Accept my apology in advance if this feature is merged already as being a beginner in Spark, I am still unaware of all the features. Thanks |
Support for Mesos checkpointing
https://issues.apache.org/jira/browse/SPARK-4899
#60
What changes were proposed in this pull request?
Enabled checkpointing on Coarse grained mode
How was this patch tested?
Unit Tests ensure that the correct SchedulerDriver is created
Please review http://spark.apache.org/contributing.html before opening a pull request.