-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uses LruCache instead of InversionTree for caching data decode matrices #104
Conversation
@nazar-pc can you please take a look? thank you. |
f1adaa9
to
6a9e3dd
Compare
Thanks, I'll try to take a look and bench it later this week. |
#74 have some context why some eviction policy is needed. LRU eviction was added in #83. But as is implemented, incrementing The eviction code implemented here: The two atomic operations and the mutex lock here: The The combination of all these causes that when you have a persistent instance of ReedSolomon encoder/decoder, and you keep calling into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a simple benchmark and it is actually slower than older code.
Before:
reconstruct :
shards : 10 / 2
shard length : 1048576
time taken : 0.12414031
byte count : 524288000
MB/s : 4027.700591371167
reconstruct :
shards : 10 / 10
shard length : 1048576
time taken : 0.133304495
byte count : 524288000
MB/s : 3750.811253589011
After:
reconstruct :
shards : 10 / 2
shard length : 1048576
time taken : 0.150660363
byte count : 524288000
MB/s : 3318.722921170713
reconstruct :
shards : 10 / 10
shard length : 1048576
time taken : 0.154363785
byte count : 524288000
MB/s : 3239.1017102878113
I'm now wondering if we should just have a small-ish hashmap (like 256 entries) and when it gets full just stop adding more entries since it is likely that application will have random inputs all the time anyway, rendering caching useless.
Then we don't need LRU at all. What do you think?
Cargo.toml
Outdated
@@ -43,6 +43,7 @@ parking_lot = { version = "0.11.2", optional = true } | |||
smallvec = "1.2" | |||
# `Mutex` implementation for `no_std` environment with the same high-level API as `parking_lot` | |||
spin = { version = "0.9.2", default-features = false, features = ["spin_mutex"] } | |||
lru = "0.7.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep dependencies sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
use super::Field; | ||
use super::ReconstructShard; | ||
|
||
const DATA_DECODE_MATRIX_CACHE_CAPACITY: usize = 254; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an odd number. Can you elaborate on the rationale behind this number, maybe leaving a comment in the code too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just preserved the current cache capacity from the existing code, not to introduce any change of behavior or performance due to cache size change:
https://github.com/rust-rse/reed-solomon-erasure/blob/eb1f66f47/src/inversion_tree.rs#L15
Cargo.toml
Outdated
@@ -43,6 +43,7 @@ parking_lot = { version = "0.11.2", optional = true } | |||
smallvec = "1.2" | |||
# `Mutex` implementation for `no_std` environment with the same high-level API as `parking_lot` | |||
spin = { version = "0.9.2", default-features = false, features = ["spin_mutex"] } | |||
lru = "0.7.8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep dependencies sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
6a9e3dd
to
90174e9
Compare
It can depend on the setup and the actual benchmark. In particular how many data/parity shards are there and how often you hit or miss the cache, or have to evict entries from the cache.
So
So effectively we pay the price to traverse the tree and cache decode matrices, but not only there is no cache hit and so no gain, but also it is constantly evicting entries from the cache using a very inefficient eviction code as I explained earlier. In our code-base I see a very significant gain (50%+) in some of our metrics by just switching from inversion-tree to lru-cache. I am guessing the setup for your benchmark hits a sweet spot for the current implementation of inversion-tree. If you share the code for the benchmark I can look more into it. Also please note that besides performance there are other issues with current implementation as mentioned earlier here: #104 (comment)
That sounds fair to me; as long as the change of behavior is not a concern. My objective here was to keep the behavior similar to current intended design to the extent possible, so preserving the lru policy. |
https://github.com/rust-rse/rse-benchmark is the benchmark I was using. If change is behavior is positive, I see no reason to not do it. But I'd like to hear some feedback from other maintainers first. |
90174e9
to
37c06a1
Compare
Looking at that benchmark code, it is always only removing the first shard: With that setting:
So that benchmark never really hits performance bottlenecks of the current implementation. I added a benchmark code in this same pull request as a separate commit. You can run the benchmark by
with and without the last commit to compare. On my machine, with current
whereas with the
So there is a massive improvement by switching from |
37c06a1
to
26b9918
Compare
Current implementation of LRU eviction policy on InversionTree is wrong and inefficient. The commit removes InversionTree and instead uses LruCache to cache data decode matrices.
26b9918
to
3660ce1
Compare
@nazar-pc I updated benchmark code to test different combinations of number of data shards, and number of parity shards.
With current
Using
|
fwiw, I'm fine with this change. I'm also heavily biased as @behzadnouri is on my team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me and thanks for adding reconstruction bench, ideally we'd move all the benches in here, maybe with criterion
for easier testing in the future.
@nazar-pc thanks for approving the pr. |
@mvines can you update changelog, version and do another release? I'm lazy today 🙂 |
heh, yep. that seems fair to me :) |
v6.0.0 is now on crates.io |
Thank you both |
Need to pick up: rust-rse/reed-solomon-erasure#104 in order to unblock: solana-labs#27510
Need to pick up: rust-rse/reed-solomon-erasure#104 in order to unblock: solana-labs#27510
Need to pick up: rust-rse/reed-solomon-erasure#104 in order to unblock: #27510
Need to pick up: rust-rse/reed-solomon-erasure#104 in order to unblock: #27510 (cherry picked from commit f02fe9c) # Conflicts: # ledger/Cargo.toml
…#28048) * updates reed-solomon-erasure crate version to 6.0.0 (#28033) Need to pick up: rust-rse/reed-solomon-erasure#104 in order to unblock: #27510 (cherry picked from commit f02fe9c) # Conflicts: # ledger/Cargo.toml * removes mergify merge conflicts Co-authored-by: behzad nouri <[email protected]>
Current implementation of LRU eviction policy on
InversionTree
is wrongand inefficient.
The commit removes
InversionTree
and instead usesLruCache
to cache datadecode matrices.