-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore #29149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @HeartSaVioR @mridulm @tgravescs ^^ |
|
|
||
| try (WriteBatch batch = db().createWriteBatch()) { | ||
| while (valueIter.hasNext()) { | ||
| final Object value = valueIter.next(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding one value (L204-L219) looks to be same with write() - let's extract and deduplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| while (it.hasNext()) { | ||
| levelDB.write(it.next()) | ||
| } | ||
| val values = Lists.newArrayList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be OK, given all entries are from inMemoryStore which are already materialized into memory.
HeartSaVioR
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK in general. Just a minor comment. I'd like to wait for others to review as well if it doesn't hold too long.
|
ok to test |
|
add to whitelist |
|
Test build #126130 has finished for PR 29149 at commit
|
|
Test build #126129 has finished for PR 29149 at commit
|
|
@mridulm @tgravescs |
|
I likely won't have time for a review so go ahead without mine |
|
OK I'll go ahead merging. To be sure I'll trigger test once again. |
|
retest this, please |
|
Test build #126278 has finished for PR 29149 at commit
|
|
retest this, please |
|
Test build #126284 has finished for PR 29149 at commit
|
|
retest this, please |
|
Test build #126290 has finished for PR 29149 at commit
|
|
Thanks! Merging to master. |
|
Thanks for the review! |
|
Sorry for the delay in getting to this. This can be trivially done with an inner loop doing The performance actually improves for larger list sizes (due to memory pressure reducing - particularly in SHS), while the smaller lists suffer from minimal impact |
|
This seems an important improvement. Should I put up a followup PR to include this change? |
|
That would be great, thanks @baohe-zhang ! |
What changes were proposed in this pull request?
The idea is to improve the performance of HybridStore by adding batch write support to LevelDB. #28412 introduces HybridStore. HybridStore will write data to InMemoryStore at first and use a background thread to dump data to LevelDB once the writing to InMemoryStore is completed. In the comments section of #28412 , @mridulm mentioned using batch writing can improve the performance of this dumping process and he wrote the code of writeAll().
Why are the changes needed?
I did the comparison of the HybridStore switching time between one-by-one write and batch write on an HDD disk. When the disk is free, the batch-write has around 25% improvement, and when the disk is 100% busy, the batch-write has 7x - 10x improvement.
when the disk is at 0% utilization:
when the disk is at 100% utilization:
I also ran some write related benchmarking tests on LevelDBBenchmark.java and measured the total time of writing 1024 objects. The tests were conducted when the disk is at 0% utilization.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually tested.