Skip to content

Conversation

@LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Sep 12, 2022

What changes were proposed in this pull request?

From the context from pr of SPARK-17321, YarnShuffleService will persist data into Level/RocksDB when Yarn NM recovery is enabled. So this pr adds the precondition description related to Yarn NM recovery is enabled for spark.shuffle.service.db.backend. in running-on-yarn.md

Why are the changes needed?

Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions

@github-actions github-actions bot added the DOCS label Sep 12, 2022
@LuciferYang LuciferYang changed the title [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db in the document [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db. enabled in the document Sep 12, 2022
@LuciferYang LuciferYang changed the title [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db. enabled in the document [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db.enabled in the document Sep 12, 2022
@LuciferYang LuciferYang changed the title [SPARK-40404][DOCS] Fix the error description related to spark.shuffle.service.db.enabled in the document [SPARK-40404][DOCS] Fix the wrong description related to spark.shuffle.service.db.enabled in the document Sep 12, 2022
Comment on lines 855 to 856
When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle
service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When Yarn NodeManager recovery is enabled, this use to specify the kind of disk-base store used in shuffle
service state store, supports `LEVELDB` and `ROCKSDB` now and `LEVELDB` as default value.
When work-preserving restart is enabled in YARN, this is used to specify the disk-base store used in shuffle
service state store, supports `LEVELDB` and `ROCKSDB` with `LEVELDB` as default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d8b39ef fix this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your correction

eventually gets cleaned up. This config may be removed in the future.
automatically reload info on current executors. This only affects standalone mode. You should also enable
<code>spark.worker.cleanup.enabled</code>, to ensure that the state eventually gets cleaned up.
This config may be removed in the future.
Copy link
Contributor

@mridulm mridulm Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing the yarn related blurb from here ? Essentially, this boolean does not control the behavior in yarn - for yarn, that is configured for the cluster, and inherits the behavior for spark

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... does yarn always has this behavior enabled mean that YarnShuffleService will always persist data into Level/RocksDB?

Is that incorrect?

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • when yarn.nodemanager.recovery.enabled is true, _recoveryPath and registeredExecutorFile in YarnShuffleService will not null, then YarnShuffleService persist data into Level/RocksDB
  • when yarn.nodemanager.recovery.enabled is false, _recoveryPath and registeredExecutorFile in YarnShuffleService will null, then YarnShuffleService not persist data into diskstore

The persist behavior of YarnShuffleService is controlled by Yarn's configuration. It seems that it not related to spark.shuffle.service.db.enabled, so I don't think it is necessary to mention yarn always has this behavior enabled in this configuration description.

If we need add yarn related descriptions here, should we also need mention mesos always has this behavior disabled here...

Copy link
Contributor Author

@LuciferYang LuciferYang Sep 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or can we change

yarn always has this behavior enabled

to

The behavior of yarn (and mesos) not depend on this configuration

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping @weixiuli @squito to help check this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have an offline discussion with the author @weixiuli of this configuration description, b92cf5c revert change in this pr

@LuciferYang LuciferYang changed the title [SPARK-40404][DOCS] Fix the wrong description related to spark.shuffle.service.db.enabled in the document [SPARK-40404][DOCS] Add precondition description for spark.shuffle.service.db.backend in running-on-yarn.md Sep 14, 2022
@LuciferYang LuciferYang requested a review from mridulm September 14, 2022 10:04
and `LEVELDB` as default value.
When work-preserving restart is enabled in YARN, this is used to specify the disk-base store used
in shuffle service state store, supports `LEVELDB` and `ROCKSDB` with `LEVELDB` as default value.
The original data store in `LevelDB/RocksDB` will not be automatically convert to another kind of storage now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert -> converted ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for @mridulm comment. And, could you add additional description about what happens at the runtime when the the store types are mismatched. It's deleted and recreated, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old one will not be deleted, but the new one will be created. When the store type is switched, the directory name will change, for example, from registeredExecutors.ldb to registeredExecutors.rdb, YarnShuffleService will create registeredExecutors.rdb if it not exists, but YarnShuffleService did not know that registeredExecutors.ldb existed, so it will not be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add The original data store will be retained and the new type data store will be created when switching storage types. Is that ok ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun @mridulm Automatic data format conversion may be a useful feature. I think it is more friendly for migrating stock users to use new features. I have filed a Jira SPARK-40464 and will promote its completion if necessary.

also cc @panbingkun , what I discussed with you offline yesterday

@mridulm
Copy link
Contributor

mridulm commented Sep 18, 2022

+CC @dongjoon-hyun for review

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you!

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 3.4. Thank you, @LuciferYang and @mridulm .

LuciferYang added a commit to LuciferYang/spark that referenced this pull request Sep 20, 2022
…ervice.db.backend` in `running-on-yarn.md`

### What changes were proposed in this pull request?
From the context from [pr](apache#19032) of [SPARK-17321](https://issues.apache.org/jira/browse/SPARK-17321), `YarnShuffleService` will persist data into `Level/RocksDB` when Yarn NM recovery is enabled. So this pr adds the precondition description related to `Yarn NM recovery is enabled` for `spark.shuffle.service.db.backend`. in `running-on-yarn.md`

### Why are the changes needed?
Add precondition description for `spark.shuffle.service.db.backend` in `running-on-yarn.md`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes apache#37853 from LuciferYang/SPARK-40404.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants