Rewrite fork tree functions using loops. #7337

expenses · 2020-10-16T08:48:44Z

This PR is intended to fix #5998. Flaming Fir currently panics when --light is used on the debug build, but not on the release build.

import is done, find_node_index_where is in progress.

expenses · 2020-10-16T08:54:37Z

~~@andresilva do you think you could help me set up a set of tests for find_node_index_where? I'm struggling to understand why something like:~~

#[test]
fn find_index_where() {
	let (mut tree, is_descendent_of) = test_fork_tree();

	assert_eq!(
		tree.find_node_index_where(&"A", &1, &is_descendent_of, &|&()| true),
		Ok(Some(vec![0]))
	);
}

~~panics with:~~

---- test::find_index_where stdout ----
thread 'test::find_index_where' panicked at 'assertion failed: `(left == right)`
  left: `Ok(None)`,
 right: `Ok(Some([0]))`', utils/fork-tree/src/lib.rs:1004:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

expenses · 2020-10-16T09:59:30Z

I've written out some tests for that now.

…port

bkchr · 2020-10-16T10:24:27Z

Why did you remove the custom decode implementation?

bkchr · 2020-10-16T10:24:42Z

(The one that you have written)

expenses · 2020-10-16T10:28:33Z

Why did you remove the custom decode implementation?

I haven't seen a node crash because of that yet, but I'll put it if I encounter that.

bkchr · 2020-10-16T10:43:46Z

Please bring it back. I have seen this. First the node crashes because of import and on restart the decode fails.

expenses · 2020-10-16T14:56:09Z

I've been rewriting the find_node_index_where function and have been fuzzing the modified implementation against the original to check for bugs. In doing so, I've come across a situation where I think my implementation handles correctly while the original does not. I've added it as a test find_node_where_specific_value:

#[test]
fn find_node_where_specific_value() {
    let mut tree = ForkTree::new();

		//
	// A - B
	//  \
	//   — C
	//
	let is_descendent_of = |base: &&str, block: &&str| -> Result<bool, TestError> {
		match (*base, *block) {
			("A", b) => Ok(b == "B" || b == "C" || b == "D"),
			("B", b) | ("C", b) => Ok(b == "D"),
			("0", _) => Ok(true),
			_ => Ok(false),
		}
	};

	tree.import("A", 1, 1, &is_descendent_of).unwrap();
	tree.import("B", 2, 2, &is_descendent_of).unwrap();
	tree.import("C", 2, 4, &is_descendent_of).unwrap();

	assert_eq!(
		tree.find_node_where(&"D", &3, &is_descendent_of, &|&n| n == 4)
			.map(|opt| opt.map(|node| node.hash)),
		Ok(Some("C"))
	);
}

and it fails with:

---- test::find_node_where_specific_value stdout ----
thread 'test::find_node_where_specific_value' panicked at 'assertion failed: `(left == right)`
  left: `Ok(None)`,
 right: `Ok(Some("C"))`', utils/fork-tree/src/lib.rs:1755:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

at present. However I believe this test should pass, as D is a descendant of C. @andresilva What do you think?

andresilva · 2020-10-16T15:30:05Z

@expenses In that example D is a child of both B and C which is not possible since B and C are on different branches.

let is_descendent_of = |base: &&str, block: &&str| -> Result<bool, TestError> {
	match (*base, *block) {
		("A", b) => Ok(b == "B" || b == "C" || b == "D"),
		("C", b) => Ok(b == "D"),
		("0", _) => Ok(true),
		_ => Ok(false),
	}
};

I think the current implementation will early exit after checking B since it exploits the fact that D can only be a child of one branch.

…n-prog

expenses · 2020-10-19T10:46:25Z

@andresilva so this is caused by a bad is_descendent_of? I think we can ignore that for now then.

andresilva · 2020-10-27T10:23:02Z

utils/fork-tree/src/lib.rs

+	impl<H: Decode, N: Decode, V: Decode> Decode for Node<H, N, V> {
+		fn decode<I: codec::Input>(input: &mut I) -> Result<Self, codec::Error> {
+			let complete = |node: &Self| {
+				node.children.len() == node.children.capacity()


…-prog

expenses · 2020-10-27T13:10:26Z

On master, cargo run --release -- --light sync at roughly 1250 bps. On this branch, it syncs at roughly 16 bps, so something is clearly wrong here.

I've implemented the suggested changes in #7337 (comment) and #7337 (comment) and this issue seems to have been resolved 🎉

andresilva

Can we add a test for encoding/decoding and then making sure both structures are equal?

We will need to do a resync from scratch again of Polkadot and Kusama (with a full node), but we might as well do it just before we are ready to merge. It's a bit annoying but this touches consensus critical code and we really need to be sure we aren't breaking anything.

utils/fork-tree/src/lib.rs

andresilva · 2020-10-27T18:26:50Z

utils/fork-tree/src/lib.rs

+				// beneath it.
+				if stack.last().map(complete).unwrap_or(false) {
+					let last = stack.pop().expect("We already checked this");
+					let latest = stack.last_mut().expect("We know there were 2 items on the stack");


Could you expand this proof, or maybe add it as a comment on why there must be at least 2 items on the stack. My understanding is that the only situation where we'd only have one item on the stack is if the root node has no children, otherwise we'll always have at least 2 nodes which are the root and whatever current children (or grandchildren..) we're currently handling.

andresilva · 2020-10-27T18:33:58Z

utils/fork-tree/src/lib.rs

+
+	#[test]
+	fn find_node_where_value_2() {
+		let mut tree = ForkTree::new();


We should add a comment on what's going on in this test. IIRC this is the test with the funky is_descendent_of where D is a child on two different forks, what is the behavior we are testing here? Initially we wanted to test that we bail out early when we test against B and the predicate fails (i.e. we wouldn't test C since D was already a known descendent of B).

Yeah, after the changes I don't think we're testing for anything specific here, just correctness.

utils/fork-tree/src/lib.rs

Co-authored-by: André Silva <[email protected]>

…-prog

expenses · 2020-11-04T14:24:44Z

@andresilva Any suggestions on how I could fix the failing grandpa test?

expenses · 2020-11-05T16:11:30Z

When I tried to sync polkadot on the light client using this, I get the following at block 9,593:
2020-11-05 14:31:19 💔 Verification failed for block 0x3f276962de02b4ce592df31ddfa7fdc595637ab042515ea2e685d1ac5c0ff9bf received from peer: 12D3KooWKBAmuHPd8xFFszN7qFXLmTDmtffbk5wGRxkgrwnqEDaG, "Invalid author: Expected secondary author: Public(5e67b64cf07d4d258a47df63835121423551712844f5b67de68e36bb9a21e127 (138nJzAB...)), got: Public(6236877b05370265640c133fec07e64d7ca823db1dc56f2d3584b3d7c0f16158 (13DmthD9...))."

andresilva · 2020-11-05T16:22:24Z

I guess the implementations I posted have a bug, the error you're getting means that some transition wasn't applied correctly. And the grandpa test that's failing also indicates the same. I'll have to look into the grandpa test.

andresilva · 2020-11-05T17:00:18Z

Regarding the grandpa test the issue is with the test itself, we add two pending changes that are supposed to be on different forks but we pass an is_descendent_of function that will always return true. The reason this worked before is because the import method would verify the block number before checking on is_descendent_of, which isn't the case now. Either way this should be fine as the is_descendent_of used by the client should be correctly defined and would not exploit this behavior of checking block numbers to verify whether two blocks are on the same fork. Changing https://github.com/paritytech/substrate/blob/master/client/finality-grandpa/src/authorities.rs#L796-L797 to static_is_descendent_of(false) should make the test pass.

There's still something wrong though if you're getting that light client error when syncing. I wonder if it's related to rebalancing, maybe just commenting that code and trying to replicate the issue would allow us to pinpoint it.

andresilva · 2020-11-05T17:17:05Z

Nevermind I don't think it could be related to rebalance() as it doesn't alter the tree structure (only ordering of branches).

expenses · 2020-11-12T13:06:22Z

@cheme would it be a good idea to merge this branch: master...cheme:iter_forktree into this?

cheme · 2020-11-12T13:18:38Z

I am not against it, it depends how much we want to avoid unsafe code (for my part I find it fine, but it can be a vast debate).

There is also the option to keep recursive code since it is a bit more readable. (I think that the stack overflow is a limit case related to block not being finalized, so it could be an acceptable to skip the switch).
If keeping recursive code, I think we should still keep the new tests and add this fix: a9f13cc .

andresilva · 2020-11-12T14:54:02Z

No, let's not add unsafe code here, that seems a bit unnecessary IMO. I'm pretty sure the issue you have observed isn't related with the rebalance() method since the code I provided doesn't change the tree structure (the only mutation is sorting vecs of children), so it cannot affect correctness. This can be verified by just commenting all the code in rebalance() and observing that the crash still happens.

There is also the option to keep recursive code since it is a bit more readable. (I think that the stack overflow is a limit case related to block not being finalized, so it could be an acceptable to skip the switch).

For me recursive code is a lot easier to read, especially in tree structures, although I know it's not like that for everyone. It is true that this is an edge case that shouldn't happen under normal operation (i.e. if we finalize something every now and then), but it would still be nice if we didn't stack overflow. Given that we already spent some time on this I'd be in favor of still trying to merge these changes to make the algorithms iterative. I will try to debug this but I cannot make any promises right now.

expenses · 2020-11-12T15:33:47Z

Given that we already spent some time on this I'd be in favor of still trying to merge these changes to make the algorithms iterative.

Sunk cost fallacy 😉

All I want is to stop the stack overflow issue from happening so the light client is usable. How we solve this problem doesn't really matter. It'd probably better if we didn't use recursion because this issue could come back if we solve it in a different way, but we can keep this branch around for that case.

expenses · 2020-12-03T14:42:25Z

Oh hold for now.

expenses added 7 commits October 7, 2020 16:16

Fix import

dca7ce3

Add custom decode impl

21f7e45

Remove serialization tests

3ab6abb

Use while pop

8cd0640

Merge remote-tracking branch 'origin/master' into ashley-fork-tree

c54b5e0

Remvoe decode

8286729

Does this work??

de8b504

expenses added A0-please_review Pull request needs code review. A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Oct 16, 2020

github-actions bot removed the A0-please_review Pull request needs code review. label Oct 16, 2020

expenses requested review from andresilva and bkchr October 16, 2020 08:49

Add find_index_where tests

6a0c36c

Merge branch 'ashley-fork-tree-in-prog' into ashley-fork-tree-only-im…

b09ea8d

…port

Might be working

12b8598

expenses added 2 commits October 16, 2020 15:33

Fingers crossed!

d7f8448

Add more tests

c7d405f

expenses added 2 commits October 19, 2020 12:44

Tidy up a little

490b393

Merge branch 'ashley-fork-tree-maybe-working' into ashley-fork-tree-i…

9b3232c

…n-prog

andresilva reviewed Oct 27, 2020

View reviewed changes

expenses added 4 commits October 27, 2020 13:26

Add values to default test fork tree

76d6688

Switch to alternate implementations

9cf44e0

Merge remote-tracking branch 'origin/master' into ashley-fork-tree-in…

f21a332

…-prog

Fix sc-consensus-epochs and sc-finality-grandpa

89c9b21

github-actions bot removed the A7-needspolkadotpr label Oct 27, 2020

Update test values

cc5baf3

expenses requested review from andresilva and cheme October 27, 2020 15:07

andresilva suggested changes Oct 27, 2020

View reviewed changes

expenses and others added 2 commits November 4, 2020 14:37

Apply suggestions from code review

c441b61

Co-authored-by: André Silva <[email protected]>

Merge remote-tracking branch 'origin/master' into ashley-fork-tree-in…

706df68

…-prog

github-actions bot added the A7-needspolkadotpr label Nov 4, 2020

expenses added 2 commits November 4, 2020 14:53

Write out decode proofs

d5046fa

Add decoding test

14dd951

github-actions bot removed the A7-needspolkadotpr label Nov 4, 2020

Fix grandpa test :D

fa18a1a

expenses closed this Dec 3, 2020

expenses deleted the ashley-fork-tree-in-prog branch August 23, 2021 11:11

Rewrite fork tree functions using loops. #7337

Rewrite fork tree functions using loops. #7337

Uh oh!

Conversation

expenses commented Oct 16, 2020

Uh oh!

expenses commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

expenses commented Oct 16, 2020

Uh oh!

bkchr commented Oct 16, 2020

Uh oh!

bkchr commented Oct 16, 2020

Uh oh!

expenses commented Oct 16, 2020

Uh oh!

bkchr commented Oct 16, 2020

Uh oh!

expenses commented Oct 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andresilva commented Oct 16, 2020

Uh oh!

expenses commented Oct 19, 2020

Uh oh!

andresilva Oct 27, 2020

Choose a reason for hiding this comment

Uh oh!

expenses commented Oct 27, 2020

Uh oh!

andresilva left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andresilva Oct 27, 2020

Choose a reason for hiding this comment

Uh oh!

expenses Nov 4, 2020

Choose a reason for hiding this comment

Uh oh!

andresilva Oct 27, 2020

Choose a reason for hiding this comment

Uh oh!

expenses Nov 4, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

expenses commented Nov 4, 2020

Uh oh!

expenses commented Nov 5, 2020

Uh oh!

andresilva commented Nov 5, 2020

Uh oh!

andresilva commented Nov 5, 2020

Uh oh!

andresilva commented Nov 5, 2020

Uh oh!

expenses commented Nov 12, 2020

Uh oh!

cheme commented Nov 12, 2020

Uh oh!

andresilva commented Nov 12, 2020

Uh oh!

expenses commented Nov 12, 2020

Uh oh!

expenses commented Dec 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

expenses commented Oct 16, 2020 •

edited

Loading

expenses commented Oct 16, 2020 •

edited

Loading