Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic LMDB mapsize allocation [1.1.0] #2605

Merged
merged 16 commits into from
Feb 27, 2019

Conversation

yeastplume
Copy link
Member

@yeastplume yeastplume commented Feb 19, 2019

Mostly to support #2525, but also to make the backend store a bit more flexible. This:

  • Allocates DB space in chunks of 128MB at a time
  • Checks whether more space is needed every time store.batch() is called. If so, increases the mapsize in 128MB chunks until the space used relative to the mapsize is beneath a threshold (currently 65%, can be tweaked later)

Does this need to be any more complicated than this?

phatness: u64,
}

impl PhatChunkStruct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😹

@yeastplume yeastplume changed the title Dynamic LMDB mapsize allocation Dynamic LMDB mapsize allocation [1.1.0] Feb 19, 2019
@yeastplume yeastplume added this to the 1.1.0 milestone Feb 19, 2019
@antiochp
Copy link
Member

Is it worth considering doing a "double the size each time" strategy?

env_info.mapsize + ALLOC_CHUNK_SIZE
};
unsafe {
self.env.set_mapsize(new_mapsize)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep 👍 I'll look into enforcing that. I was thinking calling it only from the batch creation function helps here, but of course that doesn't take multiple threads with open txns into account.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the monero code it's just seems to be implemented via a simple reference count of a global atomic. @antiochp @ignopeverell As far as I can see within the node, the Store struct is never wrapped in any mutexes, can you confirm whether there are multiple threads trying to access the ChainStore or PeerStore at any given time?

Also a bit of an issue here with multiple wallet invocations trying to access the store at the same time, which is possible under current architecture.

Copy link
Member

@antiochp antiochp Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking any txns here? Or just write txs? (All lmdb access is via a read txn or write txn).

If write txns then lmdb itself is the mutex - it only supports a single write txn at a time (across all threads).
If we successfully create a batch then we're good to go - we guarantee no other thread has a write txn active.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that setting the db mapsize rearranges the indices, affecting reads, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't actually have any mechanism to expose long-lived read txns via our store impl. So I think we're fine there, every read is effectively in its own read txn currently.
So if we take a write lock via our batch and then resize the db we should be good - the next read will simply create a new read txn on the resized db.

@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt;

use crate::core::ser;

/// number of bytes to grow the database by when needed
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit low. When Grin has full blocks, this could happen every hour or so, especially for archive peers. I'd be curious to see some metrics around how long this takes to resize. I'd also be interested to see how disruptive mdb_txn_safe::prevent_new_txns() and mdb_txn_safe::wait_no_active_txns() are (see comment on line 152). If either of those operations have a noticeable effect on performance, it'd be better to resize less often.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can try and collect some metrics once that's implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another thought here, it might be debatable whether this is the right size for the chain, but it's already too large for the wallet and peer DB. I might think about making this a parameter somehow without adding too much cruft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe we should consider @antiochp's doubling proposal? Or will that eventually be too excessive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems easy enough to tune once we have live data.

}

/// Increments the database size by one ALLOC_CHUNK_SIZE
pub fn do_resize(&self) -> Result<(), Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check available space, and try to fail more gracefully? Or do you think that's more complexity than it's worth? It's trivial to do a check like that in C++, but not sure if Rust provides APIs for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd thought this as well, but if trying to allocate larger than disk space you get:

Error: LmdbErr(Error::Code(12, 'Cannot allocate memory'))

Which I think is graceful enough without having to add more complexity here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I agree.

@yeastplume
Copy link
Member Author

Right, unfortunately just performing the resize within calls to batch results in segfaulty behaviour somewhere in LMDB. It would seem, all things considered, that the safest thing to do here on resize is to close/drop the database entirely, perform the resize, then re-open.

I've tested this by setting the chunk size to something very small (2MB) and syncing a chain from scratch. With the close/reopen behaviour in place it syncs and expands the db as needed without issue, and fully syncs. Without, it inevitably segfaults somewhere.

The downside here is that the calls to open and close the db, and therefore batch now requires a mutable reference, which means that higher-up references in ChainStore and PeerStore now need to be wrapped in mutexes. It's more cumbersome, but you could argue it's more belt-and-suspendery since we're now much more sure it's safe to reallocate the DB size at the point it's being done. Also, I believe windows will need this in place anyhow in order to resize due to its aggressive file locking.

@yeastplume
Copy link
Member Author

Also, refactored the store itself a bit to be a bit more encapsulated and ensure callers don't need to explicitly import the lmdb crate.

@antiochp
Copy link
Member

The downside here is that the calls to open and close the db, and therefore batch now requires a mutable reference, which means that higher-up references in ChainStore and PeerStore now need to be wrapped in mutexes.

That would mean we lose any ability to have multiple readers on the db simultaneously?
Right now that's a nice benefit of LMDB that may be hard to give up (lets multiple peer threads read the db for some early existence checks etc.)

@yeastplume
Copy link
Member Author

I've just tried implementing as an RwLock here. How much of a performance hit is this likely to be for the peer store?

@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt;

use crate::core::ser;

/// number of bytes to grow the database by when needed
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems easy enough to tune once we have live data.

@@ -142,7 +141,7 @@ impl OrphanBlockPool {
/// maintains locking for the pipeline to avoid conflicting processing.
pub struct Chain {
db_root: String,
store: Arc<store::ChainStore>,
store: Arc<RwLock<store::ChainStore>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the use of a RwLock but can't this be pushed down into our LMDB store? All regular operations would be considered a read, except for the close/open of the DB which would take the write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a bit of doing and testing, but think I've managed it in the latest push.

@yeastplume
Copy link
Member Author

Think this is ready for merging if anyone wants to give a final review and a little thumbs up somewhere. Since the last comments I've:

  • Moved the RwLock into store::Store itself, and moved all the locks to occur before each read/write transaction. I've tested by setting the increment chunk size to a small value (1MB at a time) and syncing from scratch, doing loads of DB resizes along the way. Current iteration works without issue.
  • I've changed the allocation size logic to keep allocating (currently 128MB) chunks until at least 45% of total mapsize is free. No reason for choosing this value, but it can be tweaked at any stage.

Copy link
Contributor

@ignopeverell ignopeverell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice looking, I like that it removes the lmdb dependency in a few crates!

@yeastplume yeastplume merged commit beaae28 into mimblewimble:milestone/1.1.0 Feb 27, 2019
@antiochp
Copy link
Member

I've just tried implementing as an RwLock here. How much of a performance hit is this likely to be for the peer store?

I suspect effectively zero given everything else going on.

@yeastplume yeastplume deleted the lmdb_resize branch March 4, 2019 10:04
@antiochp antiochp added the release notes To be included in release notes (of relevant milestone). label Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement release notes To be included in release notes (of relevant milestone).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants