Skip to content

Conversation

@gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Nov 8, 2022

What changes were proposed in this pull request?

Support using RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration spark.ui.store.path. The configuration is optional. The default KV store will still be in memory.

Note: let's make it ROCKSDB only for now instead of supporting both LevelDB and RocksDB. RocksDB is built based on LevelDB with improvements on writes and reads. Furthermore, we can reuse the RocksDBFileManager in streaming for replicating the local RocksDB file to DFS. The replication in DFS can be used for the Spark history server.

Why are the changes needed?

The current architecture of Spark live UI and Spark history server(SHS) is too simple to serve large clusters and heavy workloads:

  • Spark stores all the live UI date in memory. The size can be a few GBs and affects the driver's stability (OOM).
  • There is a limitation of storing 1000 queries only. Note that we can’t simply increase the limitation under the current Architecture. I did a memory profiling. Storing one query execution detail can take 800KB while storing one task requires 0.3KB. So for 1000 SQL queries with 1000* 2000 tasks, the memory usage for query execution and task data will be 1.4GB. Spark UI stores UI data for jobs/stages/executors as well. So to store 10k queries, it may take more than 14GB.
  • SHS has to parse JSON format event log for the initial start. The uncompressed event logs can be as big as a few GBs, and the parse can be quite slow. Some users reported they had to wait for more than half an hour.

With RocksDB as KVStore, we can improve the stability of Spark driver and fasten the startup of SHS.

Does this PR introduce any user-facing change?

Yes, supporting RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration spark.ui.store.path. The configuration is optional. The default KV store will still be in memory.

How was this patch tested?

New UT
Preview of the doc change:
image

@mridulm
Copy link
Contributor

mridulm commented Nov 9, 2022

+CC @shardulm94, @thejdeep

@mridulm
Copy link
Contributor

mridulm commented Nov 9, 2022

This pr is mostly adding ability to use a disk backed store in addition to in memory store.
It seems to not match what is described in the pr description or add equivalent functionality which was reverted - is this wip ? Or part of a larger set of changes ?

@mridulm
Copy link
Contributor

mridulm commented Nov 9, 2022

To comment on proposal in description, based on past prototypes I have worked on/seen:

Maintaining state at driver on disk backed store and copying that to dfs has a few things which impact it - particularly for larger applications.

They are not very robust to application crashes, interact in nontrivial ways with shutdown hook (hdfs failures) and increase application termination time during graceful shutdown.

Depending on application characteristics, the impact of disk backed store can positively or negatively impact driver performance (positively - as updates are faster due to index, which was lacking in in memory store (when I added index, memory requirements increased :-( ), negatively due to increased disk activity): was difficult to predict.

@mridulm
Copy link
Contributor

mridulm commented Nov 9, 2022

Also note that we cannot avoid parsing event files at history server with db generated at driver - unless the configs match for both (retained stages, tasks, queries, etc): particularly as these can be specified by users (for example, to aggressively cleanup for perf sensitive apps - but we would want a more accurate picture at SHS)

@gengliangwang
Copy link
Member Author

It seems to not match what is described in the pr description or add equivalent functionality which was reverted

I believe it covers the reverted PR #38542. The previous approach is to create an optional disk store for storing a subset of the SQLExecutionUIData.
There are also two shortages:

  • There are no evictions for the UI data on the disk
  • It requires extra memory for reading/writing LevelDB/RocksDB

@gengliangwang
Copy link
Member Author

Or part of a larger set of changes

Yes, I created Jira https://issues.apache.org/jira/browse/SPARK-41053, and this one is just the beginning.

Maintaining state at driver on disk backed store and copying that to dfs has a few things which impact it - particularly for larger applications.

Yes, but it worthies starting the effort. We can keep developing it until it is ready. I run a benchmark with Protobuf serializer, and the results are promising.
For large applications, the current in-memory live UI can bring memory pressure which affects the driver stability as well.

@LuciferYang
Copy link
Contributor

A newbie question: Does this design prefer the driver instance to monopolize a disk to ensure performance? If the driver shares a disk with other executors, will disk r/w caused by executors degrade the performance of the driver?

@mridulm
Copy link
Contributor

mridulm commented Nov 9, 2022

If this is part of a larger effort, would it be better to make this an spip instead @gengliangwang ?
I have described some of the concerns seen in past/similar efforts - and if is not clear how this effort is approaching the solution.

@gengliangwang
Copy link
Member Author

@mridulm yes I can start a SPIP since you have doubts.

@gengliangwang
Copy link
Member Author

A newbie question: Does this design prefer the driver instance to monopolize a disk to ensure performance? If the driver shares a disk with other executors, will disk r/w caused by executors degrade the performance of the driver?

@LuciferYang I don't expect the RocksDB file to be large. This is an optional solution, it won't become the default at least for now.

@gengliangwang
Copy link
Member Author

@mridulm FYI I have sent the SPIP to the dev list.


try {
open(dbPath, metadata, conf, Some(diskBackend))
} catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Live UI, are these two exception scenarios possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The LevelDB exception is not possible either. There are only two error-handling branches, and seems too much to distinguish whether the caller is for live UI or SHS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If unexpected data cleaning doesn't occur in the live UI scenario, I think it is OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something goes wrong and the kv store can't be created for live ui then the entire application fails?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgravescs If there is exception from RocksDB, it will delete the target directory and open the KV store again. Otherwise if the error is not from RocksDB, it will fall back to the in-memory store.
It follows the same logic in SHS here.

@mridulm
Copy link
Contributor

mridulm commented Nov 17, 2022

Sorry for the delay, I will try to review this later this week ...

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comments, looks good to me.

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @gengliangwang !
Looking forward to the next PR in this SPIP :-)

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM


try {
open(dbPath, metadata, conf, Some(diskBackend))
} catch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something goes wrong and the kv store can't be created for live ui then the entire application fails?

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gengliangwang
Copy link
Member Author

@mridulm @LuciferYang @tgravescs Thanks for the review.
I am merging this one to the master branch.

val store = new ElementTrackingStore(new InMemoryStore(), conf)
val storePath = conf.get(LIVE_UI_LOCAL_STORE_DIR).map(new File(_))
// For the disk-based KV store of live UI, let's simply make it ROCKSDB only for now,
// instead of supporting both LevelDB and RocksDB. RocksDB is built based on LevelDB with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, I'm wondering why did we provide these two choices in the first place. RocksDB only seems sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the context, I thought Reynold mentioned that the legal issue was resolved. Do you mean that Apache Spark still have legal issue with RocksDB, @gengliangwang ?

Screenshot 2022-11-23 at 12 28 24 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun I was explaining why @vanzin chose LevelDB. Maybe I was wrong about his reason.

Do you mean that Apache Spark still have legal issue with RocksDB

I don't think so. If so, I wouldn't choose it as the only disk backend of live UI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for confirming that.

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
### What changes were proposed in this pull request?

Support using RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration `spark.ui.store.path`. The configuration is optional. The default KV store will still be in memory.

Note: let's make it ROCKSDB only for now instead of supporting both LevelDB and RocksDB. RocksDB is built based on LevelDB with improvements on writes and reads. Furthermore, we can reuse the RocksDBFileManager in streaming for replicating the local RocksDB file to DFS. The replication in DFS can be used for the Spark history server.

### Why are the changes needed?

The current architecture of Spark live UI and Spark history server(SHS) is too simple to serve large clusters and heavy workloads:

- Spark stores all the live UI date in memory. The size can be a few GBs and affects the driver's stability (OOM).
- There is a limitation of storing 1000 queries only. Note that we can’t simply increase the limitation under the current Architecture. I did a memory profiling. Storing one query execution detail can take 800KB while storing one task requires 0.3KB. So for 1000 SQL queries with 1000* 2000 tasks, the memory usage for query execution and task data will be 1.4GB. Spark UI stores UI data for jobs/stages/executors as well.  So to store 10k queries, it may take more than 14GB.
- SHS has to parse JSON format event log for the initial start.  The uncompressed event logs can be as big as a few GBs, and the parse can be quite slow. Some users reported they had to wait for more than half an hour.

With RocksDB as KVStore, we can improve the stability of Spark driver and fasten the startup of SHS.

### Does this PR introduce _any_ user-facing change?

Yes, supporting RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration `spark.ui.store.path`. The configuration is optional. The default KV store will still be in memory.

### How was this patch tested?

New UT
Preview of the doc change:
<img width="895" alt="image" src="https://user-images.githubusercontent.com/1097932/203184691-b6815990-b7b0-422b-aded-8e1771c0c167.png">

Closes apache#38567 from gengliangwang/liveUIKVStore.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?

Support using RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration `spark.ui.store.path`. The configuration is optional. The default KV store will still be in memory.

Note: let's make it ROCKSDB only for now instead of supporting both LevelDB and RocksDB. RocksDB is built based on LevelDB with improvements on writes and reads. Furthermore, we can reuse the RocksDBFileManager in streaming for replicating the local RocksDB file to DFS. The replication in DFS can be used for the Spark history server.

### Why are the changes needed?

The current architecture of Spark live UI and Spark history server(SHS) is too simple to serve large clusters and heavy workloads:

- Spark stores all the live UI date in memory. The size can be a few GBs and affects the driver's stability (OOM).
- There is a limitation of storing 1000 queries only. Note that we can’t simply increase the limitation under the current Architecture. I did a memory profiling. Storing one query execution detail can take 800KB while storing one task requires 0.3KB. So for 1000 SQL queries with 1000* 2000 tasks, the memory usage for query execution and task data will be 1.4GB. Spark UI stores UI data for jobs/stages/executors as well.  So to store 10k queries, it may take more than 14GB.
- SHS has to parse JSON format event log for the initial start.  The uncompressed event logs can be as big as a few GBs, and the parse can be quite slow. Some users reported they had to wait for more than half an hour.

With RocksDB as KVStore, we can improve the stability of Spark driver and fasten the startup of SHS.

### Does this PR introduce _any_ user-facing change?

Yes, supporting RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration `spark.ui.store.path`. The configuration is optional. The default KV store will still be in memory.

### How was this patch tested?

New UT
Preview of the doc change:
<img width="895" alt="image" src="https://user-images.githubusercontent.com/1097932/203184691-b6815990-b7b0-422b-aded-8e1771c0c167.png">

Closes apache#38567 from gengliangwang/liveUIKVStore.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?

Support using RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration `spark.ui.store.path`. The configuration is optional. The default KV store will still be in memory.

Note: let's make it ROCKSDB only for now instead of supporting both LevelDB and RocksDB. RocksDB is built based on LevelDB with improvements on writes and reads. Furthermore, we can reuse the RocksDBFileManager in streaming for replicating the local RocksDB file to DFS. The replication in DFS can be used for the Spark history server.

### Why are the changes needed?

The current architecture of Spark live UI and Spark history server(SHS) is too simple to serve large clusters and heavy workloads:

- Spark stores all the live UI date in memory. The size can be a few GBs and affects the driver's stability (OOM).
- There is a limitation of storing 1000 queries only. Note that we can’t simply increase the limitation under the current Architecture. I did a memory profiling. Storing one query execution detail can take 800KB while storing one task requires 0.3KB. So for 1000 SQL queries with 1000* 2000 tasks, the memory usage for query execution and task data will be 1.4GB. Spark UI stores UI data for jobs/stages/executors as well.  So to store 10k queries, it may take more than 14GB.
- SHS has to parse JSON format event log for the initial start.  The uncompressed event logs can be as big as a few GBs, and the parse can be quite slow. Some users reported they had to wait for more than half an hour.

With RocksDB as KVStore, we can improve the stability of Spark driver and fasten the startup of SHS.

### Does this PR introduce _any_ user-facing change?

Yes, supporting RocksDB as the KVStore in live UI. The location of RocksDB can be set via configuration `spark.ui.store.path`. The configuration is optional. The default KV store will still be in memory.

### How was this patch tested?

New UT
Preview of the doc change:
<img width="895" alt="image" src="https://user-images.githubusercontent.com/1097932/203184691-b6815990-b7b0-422b-aded-8e1771c0c167.png">

Closes apache#38567 from gengliangwang/liveUIKVStore.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants