-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic LMDB mapsize allocation [1.1.0] #2605
Dynamic LMDB mapsize allocation [1.1.0] #2605
Conversation
phatness: u64, | ||
} | ||
|
||
impl PhatChunkStruct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😹
Is it worth considering doing a "double the size each time" strategy? |
env_info.mapsize + ALLOC_CHUNK_SIZE | ||
}; | ||
unsafe { | ||
self.env.set_mapsize(new_mapsize)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to make sure there are no active txns before resizing. See: https://github.com/monero-project/monero/blob/master/src/blockchain_db/lmdb/db_lmdb.cpp#L517-L539
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep 👍 I'll look into enforcing that. I was thinking calling it only from the batch creation function helps here, but of course that doesn't take multiple threads with open txns into account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the monero code it's just seems to be implemented via a simple reference count of a global atomic. @antiochp @ignopeverell As far as I can see within the node, the Store struct is never wrapped in any mutexes, can you confirm whether there are multiple threads trying to access the ChainStore or PeerStore at any given time?
Also a bit of an issue here with multiple wallet invocations trying to access the store at the same time, which is possible under current architecture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we talking any txns here? Or just write txs? (All lmdb access is via a read txn or write txn).
If write txns then lmdb itself is the mutex - it only supports a single write txn at a time (across all threads).
If we successfully create a batch then we're good to go - we guarantee no other thread has a write txn active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that setting the db mapsize rearranges the indices, affecting reads, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually have any mechanism to expose long-lived read txns via our store impl. So I think we're fine there, every read is effectively in its own read txn currently.
So if we take a write lock via our batch and then resize the db we should be good - the next read will simply create a new read txn on the resized db.
@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt; | |||
|
|||
use crate::core::ser; | |||
|
|||
/// number of bytes to grow the database by when needed | |||
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit low. When Grin has full blocks, this could happen every hour or so, especially for archive peers. I'd be curious to see some metrics around how long this takes to resize. I'd also be interested to see how disruptive mdb_txn_safe::prevent_new_txns() and mdb_txn_safe::wait_no_active_txns() are (see comment on line 152). If either of those operations have a noticeable effect on performance, it'd be better to resize less often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can try and collect some metrics once that's implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another thought here, it might be debatable whether this is the right size for the chain, but it's already too large for the wallet and peer DB. I might think about making this a parameter somehow without adding too much cruft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Maybe we should consider @antiochp's doubling proposal? Or will that eventually be too excessive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems easy enough to tune once we have live data.
} | ||
|
||
/// Increments the database size by one ALLOC_CHUNK_SIZE | ||
pub fn do_resize(&self) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check available space, and try to fail more gracefully? Or do you think that's more complexity than it's worth? It's trivial to do a check like that in C++, but not sure if Rust provides APIs for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd thought this as well, but if trying to allocate larger than disk space you get:
Error: LmdbErr(Error::Code(12, 'Cannot allocate memory'))
Which I think is graceful enough without having to add more complexity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I agree.
Right, unfortunately just performing the resize within calls to I've tested this by setting the chunk size to something very small (2MB) and syncing a chain from scratch. With the close/reopen behaviour in place it syncs and expands the db as needed without issue, and fully syncs. Without, it inevitably segfaults somewhere. The downside here is that the calls to open and close the db, and therefore |
Also, refactored the store itself a bit to be a bit more encapsulated and ensure callers don't need to explicitly import the lmdb crate. |
That would mean we lose any ability to have multiple readers on the db simultaneously? |
I've just tried implementing as an RwLock here. How much of a performance hit is this likely to be for the peer store? |
@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt; | |||
|
|||
use crate::core::ser; | |||
|
|||
/// number of bytes to grow the database by when needed | |||
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems easy enough to tune once we have live data.
chain/src/chain.rs
Outdated
@@ -142,7 +141,7 @@ impl OrphanBlockPool { | |||
/// maintains locking for the pipeline to avoid conflicting processing. | |||
pub struct Chain { | |||
db_root: String, | |||
store: Arc<store::ChainStore>, | |||
store: Arc<RwLock<store::ChainStore>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the use of a RwLock
but can't this be pushed down into our LMDB store? All regular operations would be considered a read, except for the close/open of the DB which would take the write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a bit of doing and testing, but think I've managed it in the latest push.
Think this is ready for merging if anyone wants to give a final review and a little thumbs up somewhere. Since the last comments I've:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice looking, I like that it removes the lmdb dependency in a few crates!
I suspect effectively zero given everything else going on. |
Mostly to support #2525, but also to make the backend store a bit more flexible. This:
store.batch()
is called. If so, increases the mapsize in 128MB chunks until the space used relative to the mapsize is beneath a threshold (currently 65%, can be tweaked later)Does this need to be any more complicated than this?