-
Notifications
You must be signed in to change notification settings - Fork 29.3k
[SPARK-40404][DOCS] Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md
#37853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -852,8 +852,8 @@ The following extra configuration options are available when the shuffle service | |
| <td><code>spark.shuffle.service.db.backend</code></td> | ||
| <td>LEVELDB</td> | ||
| <td> | ||
| To specify the kind of disk-base store used in shuffle service state store, supports `LEVELDB` and `ROCKSDB` now | ||
| and `LEVELDB` as default value. | ||
| When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle | ||
| service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value. | ||
| The original data store in `LevelDB/RocksDB` will not be automatically convert to another kind of storage now. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 for @mridulm comment. And, could you add additional description about what happens at the runtime when the the store types are mismatched. It's deleted and recreated, right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The old one will not be deleted, but the new one will be created. When the store type is switched, the directory name will change, for example, from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
also cc @panbingkun , what I discussed with you offline yesterday |
||
| </td> | ||
| <td>3.4.0</td> | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -322,9 +322,9 @@ SPARK_WORKER_OPTS supports the following system properties: | |
| <td>true</td> | ||
| <td> | ||
| Store External Shuffle service state on local disk so that when the external shuffle service is restarted, it will | ||
| automatically reload info on current executors. This only affects standalone mode (yarn always has this behavior | ||
| enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state | ||
| eventually gets cleaned up. This config may be removed in the future. | ||
| automatically reload info on current executors. This only affects standalone mode. You should also enable | ||
| <code>spark.worker.cleanup.enabled</code>, to ensure that the state eventually gets cleaned up. | ||
| This config may be removed in the future. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we removing the yarn related blurb from here ? Essentially, this boolean does not control the behavior in yarn - for yarn, that is configured for the cluster, and inherits the behavior for spark
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm... does Is that incorrect?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The persist behavior of YarnShuffleService is controlled by Yarn's configuration. It seems that it not related to If we need add yarn related descriptions here, should we also need mention
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or can we change to ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| </td> | ||
| <td>3.0.0</td> | ||
| </tr> | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d8b39ef fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your correction