Improve memory efficiency of seen cache#1073
Conversation
The `seen` cache currently is a significant memory usage hotspot due to its inefficient implementation: for every entry, two copies of the message id + timing data + `seq` overhead causes it to use much more memory than it has to. In addition, each check involves several layers of allocations as the computed message id gets salted. This PR improves on the situation by: * using a hash of the message id with the salt instead of joining strings * computing the salted id only once per message * storing one digest instead of two message id:s
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1073 +/- ##
=========================================
Coverage ? 84.53%
=========================================
Files ? 91
Lines ? 15517
Branches ? 0
=========================================
Hits ? 13118
Misses ? 2399
Partials ? 0
|
|
On holesky, this PR reduces memory usage of the seen cache by ~100mb |
| addedAt = previous.addedAt | ||
| let | ||
| previous = t.del(k) # Refresh existing item | ||
| addedAt = if previous.isSome(): |
There was a problem hiding this comment.
We had a long PR in the past to remove this pattern from the codebase and decrease the risk of raising defects. You can use https://github.com/vacp2p/nim-libp2p/blob/unstable/libp2p/utility.nim#L125
There was a problem hiding this comment.
valueOr is not applicable in this case because we're accessing a field of previous[], not previous itself
There was a problem hiding this comment.
True, but withValue can be used in this case.
There was a problem hiding this comment.
doesn't work in generic code, due to similar problems as arnetheduck/nim-results#34
There was a problem hiding this comment.
this seems to work fine:
addedAt = block:
previous.withValue(p):
p[].addedAt
else:
now
The
seencache currently is a significant memory usage hotspot due to its inefficient implementation: for every entry, two copies of the message id + timing data +seqoverhead causes it to use much more memory than it has to.In addition, each check involves several layers of allocations as the computed message id gets salted.
This PR improves on the situation by: