Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@cheme
Copy link
Contributor

@cheme cheme commented Oct 7, 2019

This is ongoing work on a new storage kind for substrate. I do open the PR to follow the progress, and get feedback.
The amount of code start to grow, and I am at a point where I want to stabilize, clean and document code, for this putting some roadmap in place will help.

Seems also like a good time to start design discussion.

Description

Some info needs to be in synch with blockchain state (information accessible or any block), but does not require the proof property.

This PR intends to have such storage. Design goals are:

  • lower overhead than trie storage: we store directly key values directly (with the need to have some history management: so double indexing and a bit of history reading).
  • synch of info up to the extrinsics call:
    • support revert when evaluating an extrinsic, and state update for next extrinsic
    • support chain reorg

Basically the same thing as the existing state trie storage without the trie proof overhead.

Use case

  • child trie ids:

Child trie key isolation can be based on parent storage path, but this does not play well with move or delete child trie operations. So associating a unique id with every child trie seems needed (see #2209). Yet those uniqueids are only an implementation details and should not be part of the state (and in case of full storage with pruning it could not (this case does not exists at this time but indicate a severe design limitation of #2209)).
Using this storage for those child trie unique id (or keyspace) seems fine. For instance, a child trie deletion and then re-creation (at the same address in the same transaction) is doable : deletion register the old unique id for pruning and delete the association from the new storage, creation get a new unique id for the new child trie so no delta for pruning is needed (global id for full history with pruning or index from the same storage if only using canonical).

An alternative to get back to #2209 state would be to use storage cache, but it is a bit less flexible (no direct access from statemachine overlay, so it would be temporary code until move or delete operation are really needed), and it is subject to exploit (as his the current implementation) : stores one rocksdb key value per block history as a linked list.

  • offchain local storage:

cc\ @todr , iirc there is such local storage in offchain storage api but not yet implemented.
It seems to me this replies the the local offchain storage, it will requires exposing the extrinsic (see TODOs), and probably handling bigger content (see TODOs on indexing and maybe handling file as content). Also maybe adding the per key lock that is in place in offchain storage and not needed in others use cases.

  • any technical info that can be different between client, eg current use cases of storage-cache.

  • info that should remain identical between client but does not fit a trie storage (we still need to synch some proof on the trie). can even be extended to state storage with lazy proof access (or only a few recalculated trie levels).

TODOs in this PR

  • overlay db handle: push transaction at the end of a block execution
  • state-db branches local storage, usage of branch index
  • state-db handling tree history of data. Indexing by branch index and block number.
  • canonical storage: simple indexing by block number, no need for branch index.
  • client testing (canonical): client test should be extended, notably assert the block number send from state-db.
  • clean code.

TODOs for other PR

Those TODOs even if a bit to much to be reviewable in this pr are strictly needed for this storage.

  • lazy pruning key set (no need to keep the set in memory). Additionally we may be pruning to often: since we do not have to maintain a deathrow delta having less frequent pruning can be fine.
  • Genesis storage initialization (may be needed for client test) and 'eset_storage'
  • Serialized implementation scaling (do not store full history in one rocksdb value (obviously don't scale), but implement two strategie:
    - linear: size limite range index and value: linked list of range with possibly included values in range.
    - mmr: range with indexing of range as in mmr allows quick access for latest blocks and reduce significantly access for very big history (eg a block index). Size limits similar to previous impl (may need to allow splitting mmr nodes index too).
    - bench with different size limits.
  • Full history mode (archive all):
    - serialized with history: likely a first indexing by branch index and canonical only indexing behind it.
    - storing association block -> branch index : that was done in one of my previous branch (client-db-ix) where I mistakenly implements branch index management at client level.
    - storing last branch ix at client level.
    - transmitting branchix in commitset of state-db and initiating state-db at right branch index
    - this mode is way less efficient than standard canonical only mode so there could be a plugged canonical only storage mode (same as current) with complement to the full trie indexed one.
  • storage cache: quite needed.

Optional TODOs

  • Api similar to offchain, add a prefix, have reserved columns for some static prefix (or add both column and prefix).
    - means that result transaction will also be grouped by those.
    - means adding extrinsics to query the storage (such ext will probably be needed soon for testing (but gated behind a feature).

  • full history with pruning: currently all non canonical state is only use in state-db in memory. Implementing full history for this storage means basically that those key do not have to be in memory until canonicalisation. This is not possible currently for state db trie because it will result in key collision. Still a costy approach would be to extend trie library to add the couple branch number block number to the encoded nodes in the underlying db (we already prefix with the node partial path). Then stored branch will becomes [Encoded branch ++ array of child (branch ix + block ix)]. This will not change hash calculation, but would certainly be a bit costy memory wise (two u64 per child, a bit less if we do some small optim like last childs if undefined are same as previous). This could be use along with a canonical compact encoding, but it would involve a pretty heavy cannonicalisation process (rewrite all branch pointer and node index for canonicalized content).

Also this pr could be split for review:

  • overlay (all change in state-machine)
  • statedb (state-db change with tree historied data)
  • client code (client with linear historied data)

This could also be organized in a clean 3 commits history.

complete some test method by providing child access to values.
is.
Need to use historied values, maybe remove pinned, and put some context
as parameter for get_offstate
probably need to store state, apply gc on state and depending on result
restore state or actually gc historied values.
for gc. Need to change commit set to contain an history of values and
adjust test db.
@cheme cheme added the A3-in_progress Pull request is in progress. No review needed at this stage. label Oct 7, 2019
Comment on lines 75 to 77
/// Internal buffer, it is either a readonly view other the
/// serialized data or the pending changes.
type SerializedBuff<'a> = Cow<'a, [u8]>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that means that if cow is borrowed variant then this is a readonly view other the serialized data and if it is owned then it is a pending change ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually doesn't seems to have more than Cow semantic, the type renaming is for me more confusing than helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, removing the alias could be better, will do.

Comment on lines 128 to 130
pub fn into_owned(self) -> Serialized<'static, F> {
Serialized(Cow::from(self.0.into_owned()), PhantomData)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then this contradict my understanding of serialized, this is owned variant and this is not a readonly view over some serialized data ? no ?

Copy link
Contributor

@gui1117 gui1117 Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this function seem never used, nor it is documented, also Serialized implements Clone and the only difference between this and clone is that internally the value is owned instead of borrowed but user of linear don't access this so from his point of view there is no difference

Comment on lines 202 to 203
// truncate here can be bad
self.0.to_mut().truncate(start_ix + SIZE_BYTE_LEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how bad ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to much :), I dropped the comment (just rewrite with copy_within)

Copy link
Contributor

@gui1117 gui1117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linear is reviewed :-)

self.0.to_mut().copy_within(start_from..start_from + size, start_to);
}

// Usize encoded as le u64 (for historical value).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Usize encoded as le u64 (for historical value).
// Read u64 encoded as le (for history index).

what do you mean by Usize encoded as le u64, history index is u64 itself

cheme and others added 2 commits October 29, 2019 13:26
Co-Authored-By: thiolliere <gui.thiolliere@gmail.com>
/// Trait defining a state for querying or modifying a tree.
/// This is a collection of branches index, corresponding
/// to a tree path.
pub trait BranchesStateTrait<S, I, BI> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the point of having S generic if it only implemented for boolean

Comment on lines +54 to +70
/// Trait defining a state for querying or modifying a branch.
/// This is therefore the representation of a branch state.
pub trait BranchStateTrait<S, I> {

/// Get state for node at a given index.
fn get_node(&self, i: I) -> S;

/// Get the last index for the state, inclusive.
fn last_index(&self) -> I;
}

/// This is a simple range, end non inclusive.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct BranchRange {
pub start: u64,
pub end: u64,
}
Copy link
Contributor

@gui1117 gui1117 Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: maybe this could make use of standard Range structures and traits as S is always a boolean

Comment on lines +139 to +143
I: Copy + Eq + TryFrom<u64> + TryInto<u64>,
BI: Copy + Eq + TryFrom<u64> + TryInto<u64>,
{
if let Some((state_branch, state_index)) = state.iter().next() {
let state_index_u64 = saturating_into(state_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure of the interest of having I being generic if is it to saturate it anyway.

@cheme cheme added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Oct 31, 2019
/// In-memory backend. Fully recomputes tries on each commit but useful for
/// tests.
pub struct InMemory<H: Hasher> {
inner: HashMap<Option<Vec<u8>>, HashMap<Vec<u8>, Vec<u8>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be renamed trie_inner or trie

pub struct InMemory<H: Hasher> {
inner: HashMap<Option<Vec<u8>>, HashMap<Vec<u8>, Vec<u8>>>,
trie: Option<TrieBackend<MemoryDB<H>, H>>,
kv: Option<InMemoryKvBackend>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must be renamed kv_inner or kv

inner: HashMap<Option<Vec<u8>>, HashMap<Vec<u8>, Vec<u8>>>,
trie: Option<TrieBackend<MemoryDB<H>, H>>,
kv: Option<InMemoryKvBackend>,
state: Option<StateBackend<MemoryDB<H>, H, InMemoryKvBackend>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you merge master you could renamed it with a more obvious name like as_state_backend_place_holder but I can't really find a good name, but at least the comment

kv
} else {
self.state.as_ref().unwrap().kv_backend()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else condition doesn't make sense, state is only used in the function for returning the immutable reference in as_state_backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember why I use it this way, it was simply to avoid copy when producing the 'as_state_backend' : instead of cloning the values as we do for the trie part it is moving the container.
So after using as_state_backend this is getting use instead of the kv field. I could probably switch to simply cloning the kv field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this is correct indeed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have some comment about this complex behavior

inner: inner,
trie: None,
state: None,
kv: Some(Default::default()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this from sounds now a bit weird to me because it is not obvious if it should make an InMemory with the kv storage or without the kv storage. usually I prefer a new function that makes it obvious. However I'm ok with this. But if we really want to make kv optional then add a doc in the from implementation stating that kv is activated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is only to avoid cloning it, would probably be better with a box. But yes the option here does not have meaning, in current state you need either kv being Some or state being Some.
But yes I could probably just clone.

/// in the backend, and produce a "transaction" that can be used to commit.
/// Does include child storage updates.
fn full_storage_root<I1, I2i, I2>(
fn full_storage_root<I1, I2i, I2, I3>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the storage root takes kv_deltas ? kv shouldn't influe the root in any way no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes kv does not influe in the root, it is mainly here to complete the transaction.
Ideally we could have a function that does only the root: this would be good from my point of view (there is quite a few place where we do not use the produced transaction).

pub struct BlockImportOperation<Block: BlockT, H: Hasher> {
old_state: CachingState<Blake2Hasher, RefTrackingState<Block>, Block>,
db_updates: PrefixedMemoryDB<H>,
db_updates: (PrefixedMemoryDB<H>, InMemoryKvBackend),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is better to avoid tuple here ? you could do trie_db_update and kv_db_update.
or maybe create a new struct ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a bit of refactoring, but those key value pr should try to get as decoupled as possible from the trie state (the opposite of what I did).
That would be a big but probably needed refactoring, I am switching those PR state to in_progress thus.

if change.is_some() {
(Ser::default(), true)
} else {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an error isn't it ? it mustn't happen that a key doesn't exist in db while being in the commit marked at unchanged, maybe

Suggested change
break;
break; // This is an invalid state, but nothing we can do

or better log something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is the case of deleting a value that is not here.

and few doc and naming changes.
@cheme
Copy link
Contributor Author

cheme commented Nov 11, 2019

Closing this PR because Child trie associated issue will be work around in a way that does not require this.
It make this a non prioritary thing.
Generally it adds complexity, and would require its own crate separation.
Some other part could maybe get merge.
Rethinking the design could also be interesting.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A3-in_progress Pull request is in progress. No review needed at this stage.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants