-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic LMDB mapsize allocation [1.1.0] #2605
Changes from 2 commits
ce4d887
85d3e27
8ea9d43
b35bda5
4b2e2f0
8ba03fd
5b49a16
4f28272
4892e35
7e92a1b
ebb7513
0fb3cc5
66e3c78
1150512
628be78
b4e360e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,10 @@ use lmdb_zero::LmdbResultExt; | |
|
||
use crate::core::ser; | ||
|
||
/// number of bytes to grow the database by when needed | ||
pub const ALLOC_CHUNK_SIZE: usize = 134_217_728; //128 MB | ||
const RESIZE_PERCENT: f32 = 0.9; | ||
|
||
/// Main error type for this lmdb | ||
#[derive(Clone, Eq, PartialEq, Debug, Fail)] | ||
pub enum Error { | ||
|
@@ -71,26 +75,13 @@ pub fn new_named_env(path: String, name: String, max_readers: Option<u32>) -> lm | |
|
||
let mut env_builder = lmdb::EnvBuilder::new().unwrap(); | ||
env_builder.set_maxdbs(8).unwrap(); | ||
// half a TB should give us plenty room, will be an issue on 32 bits | ||
// (which we don't support anyway) | ||
|
||
#[cfg(not(target_os = "windows"))] | ||
env_builder.set_mapsize(5_368_709_120).unwrap_or_else(|e| { | ||
panic!("Unable to allocate LMDB space: {:?}", e); | ||
}); | ||
//TODO: This is temporary to support (beta) windows support | ||
//Windows allocates the entire file at once, so this needs to | ||
//be changed to allocate as little as possible and increase as needed | ||
#[cfg(target_os = "windows")] | ||
env_builder.set_mapsize(524_288_000).unwrap_or_else(|e| { | ||
panic!("Unable to allocate LMDB space: {:?}", e); | ||
}); | ||
|
||
if let Some(max_readers) = max_readers { | ||
env_builder | ||
.set_maxreaders(max_readers) | ||
.expect("Unable set max_readers"); | ||
} | ||
|
||
unsafe { | ||
env_builder | ||
.open(&full_path, lmdb::open::NOTLS, 0o600) | ||
|
@@ -120,6 +111,53 @@ impl Store { | |
Store { env, db } | ||
} | ||
|
||
/// Determines whether the environment needs a resize based on a simple percentage threshold | ||
pub fn needs_resize(&self) -> Result<bool, Error> { | ||
let env_info = self.env.info()?; | ||
let stat = self.env.stat()?; | ||
|
||
let size_used = stat.psize as usize * env_info.last_pgno; | ||
trace!("DB map size: {}", env_info.mapsize); | ||
trace!("Space used: {}", size_used); | ||
trace!("Space remaining: {}", env_info.mapsize - size_used); | ||
let resize_percent = RESIZE_PERCENT; | ||
trace!( | ||
"Percent used: {:.*} Percent threshold: {:.*}", | ||
4, | ||
size_used as f64 / env_info.mapsize as f64, | ||
4, | ||
resize_percent | ||
); | ||
|
||
if size_used as f32 / env_info.mapsize as f32 > resize_percent | ||
|| env_info.mapsize < ALLOC_CHUNK_SIZE | ||
{ | ||
trace!("Resize threshold met (percent-based)"); | ||
Ok(true) | ||
} else { | ||
trace!("Resize threshold not met (percent-based)"); | ||
Ok(false) | ||
} | ||
} | ||
|
||
/// Increments the database size by one ALLOC_CHUNK_SIZE | ||
pub fn do_resize(&self) -> Result<(), Error> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we check available space, and try to fail more gracefully? Or do you think that's more complexity than it's worth? It's trivial to do a check like that in C++, but not sure if Rust provides APIs for that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd thought this as well, but if trying to allocate larger than disk space you get:
Which I think is graceful enough without having to add more complexity here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool. I agree. |
||
let env_info = self.env.info()?; | ||
let new_mapsize = if env_info.mapsize < ALLOC_CHUNK_SIZE { | ||
ALLOC_CHUNK_SIZE | ||
} else { | ||
env_info.mapsize + ALLOC_CHUNK_SIZE | ||
}; | ||
unsafe { | ||
self.env.set_mapsize(new_mapsize)?; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to make sure there are no active txns before resizing. See: https://github.com/monero-project/monero/blob/master/src/blockchain_db/lmdb/db_lmdb.cpp#L517-L539 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep 👍 I'll look into enforcing that. I was thinking calling it only from the batch creation function helps here, but of course that doesn't take multiple threads with open txns into account. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the monero code it's just seems to be implemented via a simple reference count of a global atomic. @antiochp @ignopeverell As far as I can see within the node, the Store struct is never wrapped in any mutexes, can you confirm whether there are multiple threads trying to access the ChainStore or PeerStore at any given time? Also a bit of an issue here with multiple wallet invocations trying to access the store at the same time, which is possible under current architecture. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we talking any txns here? Or just write txs? (All lmdb access is via a read txn or write txn). If write txns then lmdb itself is the mutex - it only supports a single write txn at a time (across all threads). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that setting the db mapsize rearranges the indices, affecting reads, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't actually have any mechanism to expose long-lived read txns via our store impl. So I think we're fine there, every read is effectively in its own read txn currently. |
||
} | ||
info!( | ||
"Resized database from {} to {}", | ||
env_info.mapsize, new_mapsize | ||
); | ||
Ok(()) | ||
} | ||
|
||
/// Gets a value from the db, provided its key | ||
pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, Error> { | ||
let txn = lmdb::ReadTransaction::new(self.env.clone())?; | ||
|
@@ -178,6 +216,10 @@ impl Store { | |
|
||
/// Builds a new batch to be used with this store. | ||
pub fn batch(&self) -> Result<Batch<'_>, Error> { | ||
// check if the db needs resizing before returning the batch | ||
if self.needs_resize()? { | ||
self.do_resize()?; | ||
} | ||
let txn = lmdb::WriteTransaction::new(self.env.clone())?; | ||
Ok(Batch { | ||
store: self, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
// Copyright 2018 The Grin Developers | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
use grin_store as store; | ||
use grin_util as util; | ||
|
||
use grin_core::ser::{self, Readable, Reader, Writeable, Writer}; | ||
|
||
use std::fs; | ||
use std::sync::Arc; | ||
|
||
const WRITE_CHUNK_SIZE: usize = 20; | ||
const TEST_ALLOC_SIZE: usize = store::lmdb::ALLOC_CHUNK_SIZE / 8 / WRITE_CHUNK_SIZE; | ||
|
||
#[derive(Clone)] | ||
struct PhatChunkStruct { | ||
phatness: u64, | ||
} | ||
|
||
impl PhatChunkStruct { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😹 |
||
/// create | ||
pub fn new() -> PhatChunkStruct { | ||
PhatChunkStruct { phatness: 0 } | ||
} | ||
} | ||
|
||
impl Readable for PhatChunkStruct { | ||
fn read(reader: &mut Reader) -> Result<PhatChunkStruct, ser::Error> { | ||
let mut retval = PhatChunkStruct::new(); | ||
for _ in 0..TEST_ALLOC_SIZE { | ||
retval.phatness = reader.read_u64()?; | ||
} | ||
Ok(retval) | ||
} | ||
} | ||
|
||
impl Writeable for PhatChunkStruct { | ||
fn write<W: Writer>(&self, writer: &mut W) -> Result<(), ser::Error> { | ||
// write many times | ||
for _ in 0..TEST_ALLOC_SIZE { | ||
writer.write_u64(self.phatness)?; | ||
} | ||
Ok(()) | ||
} | ||
} | ||
|
||
fn clean_output_dir(test_dir: &str) { | ||
let _ = fs::remove_dir_all(test_dir); | ||
} | ||
|
||
fn setup(test_dir: &str) { | ||
util::init_test_logger(); | ||
clean_output_dir(test_dir); | ||
} | ||
|
||
#[test] | ||
fn lmdb_allocate() -> Result<(), store::Error> { | ||
let test_dir = "test_output/lmdb_allocate"; | ||
setup(test_dir); | ||
// Allocate more than the initial chunk, ensuring | ||
// the DB resizes underneath | ||
{ | ||
let env = Arc::new(store::new_env(test_dir.to_owned())); | ||
let store = store::Store::open(env.clone(), "test1"); | ||
|
||
for i in 0..WRITE_CHUNK_SIZE * 2 { | ||
println!("Allocating chunk: {}", i); | ||
let chunk = PhatChunkStruct::new(); | ||
let mut key_val = format!("phat_chunk_set_1_{}", i).as_bytes().to_vec(); | ||
let batch = store.batch()?; | ||
let key = store::to_key(b'P', &mut key_val); | ||
batch.put_ser(&key, &chunk)?; | ||
batch.commit()?; | ||
} | ||
} | ||
// Open env again and keep adding | ||
{ | ||
println!("***********************************"); | ||
println!("***************NEXT*****************"); | ||
println!("***********************************"); | ||
let env = Arc::new(store::new_env(test_dir.to_owned())); | ||
let store = store::Store::open(env.clone(), "test1"); | ||
|
||
for i in 0..WRITE_CHUNK_SIZE * 2 { | ||
println!("Allocating chunk: {}", i); | ||
let chunk = PhatChunkStruct::new(); | ||
let mut key_val = format!("phat_chunk_set_2_{}", i).as_bytes().to_vec(); | ||
let batch = store.batch()?; | ||
let key = store::to_key(b'P', &mut key_val); | ||
batch.put_ser(&key, &chunk)?; | ||
batch.commit()?; | ||
} | ||
} | ||
|
||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit low. When Grin has full blocks, this could happen every hour or so, especially for archive peers. I'd be curious to see some metrics around how long this takes to resize. I'd also be interested to see how disruptive mdb_txn_safe::prevent_new_txns() and mdb_txn_safe::wait_no_active_txns() are (see comment on line 152). If either of those operations have a noticeable effect on performance, it'd be better to resize less often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can try and collect some metrics once that's implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just another thought here, it might be debatable whether this is the right size for the chain, but it's already too large for the wallet and peer DB. I might think about making this a parameter somehow without adding too much cruft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Maybe we should consider @antiochp's doubling proposal? Or will that eventually be too excessive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems easy enough to tune once we have live data.