[SPARK-40404][DOCS] Add precondition description for `spark.shuffle.service.db.backend` in `running-on-yarn.md` #37853

LuciferYang · 2022-09-12T02:58:55Z

What changes were proposed in this pull request?

From the context from pr of SPARK-17321, YarnShuffleService will persist data into Level/RocksDB when Yarn NM recovery is enabled. So this pr adds the precondition description related to Yarn NM recovery is enabled for spark.shuffle.service.db.backend. in running-on-yarn.md

Why are the changes needed?

Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions

mridulm · 2022-09-12T07:38:50Z

docs/running-on-yarn.md

+    When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle 
+    service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value. 


Suggested change

When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle

service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value.

When work-preserving restart is enabled in YARN, this is used to specify the disk-base store used in shuffle

service state store, supports `LEVELDB` and `ROCKSDB` with `LEVELDB` as default value.

d8b39ef fix this

Thanks for your correction

mridulm · 2022-09-12T07:39:57Z

docs/spark-standalone.md

-    eventually gets cleaned up.  This config may be removed in the future.
+    automatically reload info on current executors.  This only affects standalone mode.  You should also enable 
+    <code>spark.worker.cleanup.enabled</code>, to ensure that the state eventually gets cleaned up.  
+    This config may be removed in the future.


Why are we removing the yarn related blurb from here ? Essentially, this boolean does not control the behavior in yarn - for yarn, that is configured for the cluster, and inherits the behavior for spark

Hmm... does yarn always has this behavior enabled mean that YarnShuffleService will always persist data into Level/RocksDB?

Is that incorrect?

when yarn.nodemanager.recovery.enabled is true, _recoveryPath and registeredExecutorFile in YarnShuffleService will not null, then YarnShuffleService persist data into Level/RocksDB

when yarn.nodemanager.recovery.enabled is false, _recoveryPath and registeredExecutorFile in YarnShuffleService will null, then YarnShuffleService not persist data into diskstore

The persist behavior of YarnShuffleService is controlled by Yarn's configuration. It seems that it not related to spark.shuffle.service.db.enabled, so I don't think it is necessary to mention yarn always has this behavior enabled in this configuration description.

If we need add yarn related descriptions here, should we also need mention mesos always has this behavior disabled here...

Or can we change

yarn always has this behavior enabled

to

The behavior of yarn (and mesos) not depend on this configuration

?

friendly ping @weixiuli @squito to help check this change

Have an offline discussion with the author @weixiuli of this configuration description, b92cf5c revert change in this pr

mridulm · 2022-09-14T16:22:52Z

docs/running-on-yarn.md

-    and `LEVELDB` as default value. 
+    When work-preserving restart is enabled in YARN, this is used to specify the disk-base store used 
+    in shuffle service state store, supports `LEVELDB` and `ROCKSDB` with `LEVELDB` as default value. 
    The original data store in `LevelDB/RocksDB` will not be automatically convert to another kind of storage now.


convert -> converted ?

+1 for @mridulm comment. And, could you add additional description about what happens at the runtime when the the store types are mismatched. It's deleted and recreated, right?

The old one will not be deleted, but the new one will be created. When the store type is switched, the directory name will change, for example, from registeredExecutors.ldb to registeredExecutors.rdb, YarnShuffleService will create registeredExecutors.rdb if it not exists, but YarnShuffleService did not know that registeredExecutors.ldb existed, so it will not be deleted

Add The original data store will be retained and the new type data store will be created when switching storage types. Is that ok ?

@dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.

@dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.

also cc @panbingkun , what I discussed with you offline yesterday

mridulm · 2022-09-18T17:57:23Z

+CC @dongjoon-hyun for review

dongjoon-hyun

+1, LGTM. Thank you!

dongjoon-hyun · 2022-09-19T02:24:40Z

Merged to master for Apache Spark 3.4. Thank you, @LuciferYang and @mridulm .

…ervice.db.backend` in `running-on-yarn.md` ### What changes were proposed in this pull request? From the context from [pr](apache#19032) of [SPARK-17321](https://issues.apache.org/jira/browse/SPARK-17321), `YarnShuffleService` will persist data into `Level/RocksDB` when Yarn NM recovery is enabled. So this pr adds the precondition description related to `Yarn NM recovery is enabled` for `spark.shuffle.service.db.backend`. in `running-on-yarn.md` ### Why are the changes needed? Add precondition description for `spark.shuffle.service.db.backend` in `running-on-yarn.md` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions Closes apache#37853 from LuciferYang/SPARK-40404. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

fix

d8fbfbf

github-actions bot added the DOCS label Sep 12, 2022

LuciferYang mentioned this pull request Sep 12, 2022

[SPARK-40364][CORE] Use the unified DBProvider#initDB method #37826

Closed

LuciferYang changed the title ~~[SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db in the document~~ [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db. enabled in the document Sep 12, 2022

LuciferYang changed the title ~~[SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db. enabled in the document~~ [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db.enabled in the document Sep 12, 2022

LuciferYang changed the title ~~[SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db.enabled in the document~~ [SPARK-40404][DOCS] Fix the wrong description related to spark.shuffle.service.db.enabled in the document Sep 12, 2022

mridulm reviewed Sep 12, 2022

View reviewed changes

LuciferYang added 2 commits September 12, 2022 15:56

fix

d8b39ef

revert change of docs/spark-standalone.md

b92cf5c

LuciferYang changed the title ~~[SPARK-40404][DOCS] Fix the wrong description related to spark.shuffle.service.db.enabled in the document~~ [SPARK-40404][DOCS] Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md Sep 14, 2022

LuciferYang requested a review from mridulm September 14, 2022 10:04

mridulm approved these changes Sep 14, 2022

View reviewed changes

add more

88a253e

mridulm approved these changes Sep 18, 2022

View reviewed changes

dongjoon-hyun approved these changes Sep 18, 2022

View reviewed changes

dongjoon-hyun closed this in e19a729 Sep 19, 2022

		When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle
		service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value.

[SPARK-40404][DOCS] Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md #37853

[SPARK-40404][DOCS] Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md #37853

Uh oh!

Conversation

LuciferYang commented Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mridulm Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LuciferYang Sep 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mridulm commented Sep 18, 2022

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Sep 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-40404][DOCS] Add precondition description for `spark.shuffle.service.db.backend` in `running-on-yarn.md` #37853

[SPARK-40404][DOCS] Add precondition description for `spark.shuffle.service.db.backend` in `running-on-yarn.md` #37853

LuciferYang commented Sep 12, 2022 •

edited

Loading

mridulm Sep 12, 2022 •

edited

Loading

LuciferYang Sep 12, 2022 •

edited

Loading

LuciferYang Sep 12, 2022 •

edited

Loading

LuciferYang Sep 12, 2022 •

edited

Loading