new blooms database#8712
Conversation
|
A couple of questions.
I wrote about this here: Adaptive Enhanced Bloom Filters. |
| use hash::keccak; | ||
| use kvdb::{KeyValueDB, DBTransaction}; | ||
| use kvdb_memorydb; | ||
| use kvdb::{DBTransaction}; |
There was a problem hiding this comment.
Remove blocks around a single import
| use rand::Rng; | ||
|
|
||
| use kvdb::{KeyValueDB, DBValue}; | ||
| use kvdb::{DBValue}; |
There was a problem hiding this comment.
Remove blocks around a single import
| use snappy; | ||
| use kvdb::{KeyValueDB, DBTransaction}; | ||
| use kvdb_memorydb; | ||
| use kvdb::{DBTransaction}; |
There was a problem hiding this comment.
Remove blocks around a single import
| use snapshot::{self, ManifestData, SnapshotService}; | ||
| use spec::Spec; | ||
| use test_helpers::{self, generate_dummy_client_with_spec_and_data}; | ||
| use test_helpers::{generate_dummy_client_with_spec_and_data}; |
There was a problem hiding this comment.
Remove blocks around a single import
|
|
||
| use io::IoChannel; | ||
| use kvdb_rocksdb::{Database, DatabaseConfig}; | ||
| use kvdb_rocksdb::{DatabaseConfig}; |
There was a problem hiding this comment.
Remove blocks around a single import
| use std::sync::Arc; | ||
| use parking_lot::RwLock; | ||
| use kvdb::{KeyValueDB, KeyValueDBHandler}; | ||
| use kvdb::{KeyValueDB}; |
There was a problem hiding this comment.
Remove blocks around a single import
|
|
||
| cache_manager.collect_garbage(current_size, | ids | { | ||
| for id in &ids { | ||
| match *id { |
There was a problem hiding this comment.
Use if let instead of match of a single pattern?
|
@niklasad1 this pr is still in progress and quite far away from being ready for a review :)
They will, especially top level blooms. But the cost of checking them is so low, that it doesn't matter. Currently we use exactly the same bloom filters with three layers for filtering events. The bottleneck is rocksdb io, and this pr is trying to address only that problem. :)
Yes, that's a possible improvement.
You can use transaction traces exactly for that. They already utilize the same bloom filters. |
The reason I'm suggesting adding every address to the bloom is to avoid having to request traces. If one is trying to do a full accounting of a particular address (or group of addresses), one needs to visit every trace, but if the blooms included all addresses involved in a block, one could use them to avoid a huge number of trace requests. Requesting traces (especially across the 2016 dDos transactions) is brutal. Did you have a chance to glance at the paper I mentioned above? With that method, I can collect full lists of transactions per account about 100 times faster than reading every trace. The current blooms don't include every address, so they miss a lot of blocks if one wants a full list of transactions. Also, one thing I found was that because the block level blooms "roll-up" the transaction level blooms, two unfortunate things happen: (1) the block level blooms are becoming saturated, so they are reporting more false positives than you might want (this is exacerbated in the hierarchical scheme you suggest without widening the mid- and upper-level blooms), and (2) the receipt-level blooms are way too undersaturated and therefore take up a ton of space that isn't well-utilized. |
…, cause fs::File is just a shared file handle
debris
left a comment
There was a problem hiding this comment.
I think it's ready for the review
| fn open(&self, db_path: &Path) -> Result<Arc<BlockChainDB>, Error> { | ||
| let key_value = Arc::new(kvdb_rocksdb::Database::open(&self.config, &db_path.to_string_lossy())?); | ||
| let blooms_path = db_path.join("blooms"); | ||
| let trace_blooms_path = db_path.join("trace_blooms"); |
There was a problem hiding this comment.
I'm not sure if this is a good location for blooms db. I'm open for any other suggestions
| self.importer.miner.clear(); | ||
| let db = self.db.write(); | ||
| db.restore(new_db)?; | ||
| db.key_value().restore(new_db)?; |
There was a problem hiding this comment.
because blooms-db is placed in the same directory, it also moves it. Maybe we should move this function out of KeyValueDB interface to make it more descriptive
| db.restore(new_db)?; | ||
| db.key_value().restore(new_db)?; | ||
|
|
||
| // TODO: restore blooms properly |
There was a problem hiding this comment.
ah, I forgot to reopen the blooms db, will fix it
There was a problem hiding this comment.
not yet, that's the only remaining thing :)
| // Some(3) -> COL_EXTRA | ||
| // 3u8 -> ExtrasIndex::BlocksBlooms | ||
| // 0u8 -> level 0 | ||
| let blooms_iterator = db.key_value() |
There was a problem hiding this comment.
from ethcore/src/blockchain/extras.rs
| // Some(4) -> COL_TRACE | ||
| // 1u8 -> TraceDBIndex::BloomGroups | ||
| // 0u8 -> level 0 | ||
| let trace_blooms_iterator = db.key_value() |
There was a problem hiding this comment.
from ethcore/src/trace/db.rs
sorpaas
left a comment
There was a problem hiding this comment.
LGTM overall. Just not sure about some aspects regarding the database consistency issues.
| }); | ||
|
|
||
| for (number, blooms) in blooms_iterator { | ||
| db.blooms().insert_blooms(number, blooms.iter())?; |
There was a problem hiding this comment.
insert_blooms is not atomic. If any of top, mid, or bot file write fails, the bloomdb basically enters a corrupt status -- we will have top-level blooms saying something exists but cannot be found in mid or bot (true negative). So I think it may be better to just panic expect here like we do in blockchain.rs?
There was a problem hiding this comment.
I don't think it matters. even if it ends up in the corrupt state. On next parity launch the migration will start from scratch. Corrupted data will be overwritten and database state will be consistent
| // constant forks make lead to increased ration of false positives in bloom filters | ||
| // since we do not rebuild top or mid level, but we should not be worried about that | ||
| // most of the time events at block n(a) occur also on block n(b) or n+1(b) | ||
| self.top.accrue_bloom::<ethbloom::BloomRef>(pos.top, bloom)?; |
There was a problem hiding this comment.
Regarding possible bloomdb true negative corrupt situation, given that we only insert one bloom at a time, I think a simple solution maybe to create a metadata file, storing the current writing bloom position before attempting the insertion. Then, if we find any error case in below, next time when the program starts, we would only need to rewrite that particular bloom position to fix the corruption.
This doesn't solve the disk consistency issue. Otherwise we will need to flush within the for loop, which may be a bad performance penalty. So not sure whether we would want this.
There was a problem hiding this comment.
Then, if we find any error case in below, next time when the program starts, we would only need to rewrite that particular bloom position to fix the corruption.
That's what I did initially. Then I thought about the problem again and realised that all of this does not matter. Existence of blooms is not consensus critical. If one of the writes fail (or partially fail), the particular block will be reimported on the next launch of parity and the database will overwrite corrupted data
| /// | ||
| /// This database does not guarantee atomic writes. | ||
| pub struct Database { | ||
| database: Mutex<db::Database>, |
There was a problem hiding this comment.
Any reasons we use Mutex but not RwLock? I think it would be safe for multiple parties to read blooms at the same time?
There was a problem hiding this comment.
Great question! There is a reason, https://users.rust-lang.org/t/how-to-handle-match-with-irrelevant-ok--/6291/15
There was a problem hiding this comment.
tl;dr
Reading a file is mutating the object that represent the file stream, namely altering the position. The very data being mutated is stored in the kernel space, and simultaneous mutation won’t cause the kernel any trouble because it is smart enough to synchronize accesses.
Still, two threads simultaneously reading the same file stream will break internal logic. (I guess even a single read() call on BufReader could yield a non-contiguous slice).
There was a problem hiding this comment.
That's why I made iterator function of db::Database mut.
|
Please add a ✔️ tick if you think this looks good. |
| /// Returns numbers of blocks containing given bloom. | ||
| fn blocks_with_bloom(&self, bloom: &Bloom, from_block: BlockNumber, to_block: BlockNumber) -> Vec<BlockNumber>; | ||
| fn blocks_with_bloom<'a, B, I, II>(&self, blooms: II, from_block: BlockNumber, to_block: BlockNumber) -> Vec<BlockNumber> | ||
| where BloomRef<'a>: From<B>, II: IntoIterator<Item = B, IntoIter = I> + Copy, I: Iterator<Item = B>, Self: Sized; |
There was a problem hiding this comment.
I think that each trait definition deserves its own line (to make it easier to read)
e.g,
where
BloomRef<'a>: From<B>,
II: IntoIterator<Item = B, IntoIter = I> + Copy,
I: Iterator<Item = B>,
Self: Sized;| .map(|b| b as BlockNumber) | ||
| .collect() | ||
| fn blocks_with_bloom<'a, B, I, II>(&self, blooms: II, from_block: BlockNumber, to_block: BlockNumber) -> Vec<BlockNumber> | ||
| where BloomRef<'a>: From<B>, II: IntoIterator<Item = B, IntoIter = I> + Copy, I: Iterator<Item = B> { |
There was a problem hiding this comment.
I think that each trait definition deserves its own line (to make it easier to read)
e.g,
where
BloomRef<'a>: From<B>,
II: IntoIterator<Item = B, IntoIter = I> + Copy,
I: Iterator<Item = B>,
Self: Sized;
| match is_done { | ||
| true => { | ||
| db.flush().map_err(UtilError::from)?; | ||
| // TODO: flush also blooms? |
There was a problem hiding this comment.
As I understand the API doesn't provide flushing but it does flush everytime new blooms are inserted?!
If so clarify comment alternatively change the API and actually flush!
There was a problem hiding this comment.
that's an obsolete comment ;)
Yes, this db doesn't provide flushing, cause writes are nonatomic and happen immediately when someone calls insert
| // config, | ||
| bloom_config: BloomConfig, | ||
| // tracing enabled | ||
| cache_manager: RwLock<CacheManager<H256>>, |
There was a problem hiding this comment.
Missing docs for cache_manager
There was a problem hiding this comment.
ahh... it's not a public field and no change has been made here :p but I'll add a description ;p
| for key in blooms_keys { | ||
| self.note_used(CacheId::Bloom(key)); | ||
| } | ||
| // TODO: replace it with database for trace blooms |
There was a problem hiding this comment.
Is this supposed to be fixed in this PR?
There was a problem hiding this comment.
oh, yes, obsolete comment 😅
| let numbers = chain.filter(filter); | ||
| let possibilities = filter.bloom_possibilities(); | ||
| let numbers = self.db.trace_blooms() | ||
| .filter(filter.range.start as u64, filter.range.end as u64, &possibilities).expect("TODO: blooms pr"); |
There was a problem hiding this comment.
If this is fixed change the expect message!
| let blooms = rlp::decode_list::<Bloom>(&group); | ||
| (number, blooms) | ||
| }); | ||
|
|
There was a problem hiding this comment.
This code is identical except the variables to #L38-#L52 consider moving it to a function instead?
There was a problem hiding this comment.
it's not exactly the same, notice that endianness is completely different. also the function would have 4 variables and imo would be less readable than what we have here. As this is a migration, this code will never be modified again
There was a problem hiding this comment.
Ok, makes sense then NVM!
| [package] | ||
| name = "blooms-db" | ||
| version = "0.1.0" | ||
| authors = ["debris <marek.kotewicz@gmail.com>"] |
There was a problem hiding this comment.
This should be admin@parity.io?
There was a problem hiding this comment.
Forgot one thing, also add a license I guess it should be: license = "GPL-3.0"
There was a problem hiding this comment.
I believe it doesn't matter for subcrates :)
There was a problem hiding this comment.
actually, you are right. We usually have a license field so I'll also add it :)
niklasad1
left a comment
There was a problem hiding this comment.
Overall, a really high-quality PR but some minor things to fix!
|
@niklasad1 @sorpaas I addressed grumbles. Can you review the pr again and mark as accepted (if it's good) :) |
|
also, I merged |
* 'master' of https://github.com/paritytech/parity: new blooms database (openethereum#8712) ethstore: retry deduplication of wallet file names until success (openethereum#8910) Update ropsten.json (openethereum#8926)
* master: new blooms database (#8712)
This pr introduces new database for header blooms.
Why?
The only time when we access blooms is when we are filtering logs. Filtering logs requires iteration over all blooms in a database and this is very inefficient with RocksDB. As blockchain got bigger and bigger it took more and more time to effectively filter all blooms starting from the genesis block.
How?
I built increment only, sequential database for storing blooms. It consists of three database layers.
top,mid,bot.botlayer contains all header bloomsmidlayer contains blooms created from 16 consecutive blooms of thebotlayer.toplayer contains blooms from 16 consecutive blooms from themidlayer.Blooms at each layer are placed next to each other on a disk. So it's very efficient to find
O(1)~and iterateO(1)~over them. It's a huge improvement compared to the previous database implementation where every lookup and step of iteration was probablyO(log n).size of a database (for 1000000 blocks)
botlayer - 256mb (1000000 * size_of_bloom (256b))midlayer - 16mb (bot/ 16)toplayer - 1mb (mid/ 16)Other issues addressed
Blooms recomputation in case of a fork is very slow #8552 - this pr also addresses Blooms recomputation in case of a fork is very slow #8552 by simply not recomputing blooms at higher levels. Why? Because in case of a fork, it's likely that the same transaction on a different chain will be included in block of the same number or in one of the following blocks. As long as the distance between block on chain a, and block on chain b, is on average significantly lower than 128, false-positive ration of bloom filter should not increase. And even if the
topandmidbloom return the false-positive, the underlying bottom bloom is replaced with a new one, so this will not lead to incorrect results.https://github.com/paritytech/parity/blob/343b29866c4cc592afa9038995217b6ba99e70fa/util/blooms-db/src/db.rs#L96-L98
this pr also addresses all issues with bloom recomputation on forks. Recomputation does not happen and there is no in-memory cache, so blooms are always valid
todo list (in this pr)
BlockChainDB)BlockChainDBshould not expose&RwLockin the interfacebloomchaincratebenchmarks
So far I run only a single benchmark for blooms-db. The results are promising, but I believe that they do not fully show how much everything has improved.
benchmark from old stackoverflow thread
old parity:
this branch:
This pr, should also reduce disk writes by roughly 70gb when doing full sync