feat: restart ETH witnessers on failure #2791

kylezs · 2023-01-30T10:27:38Z

On any failure this will restart the ETH witnessers. This includes creating a fresh ETH WS client, which should resolve most, if not all of the WS connection issues.

Can reuse a lot of code and concepts to do the same for DOT.

Most of the pain (read: fun) in this PR was borrow checker issues. But the nice part about that is, it's a little easier to verify that this works just by looking at the types, since this compiles :)

Out of scope

What to do if the witnesser checkpointing panics? #2778 this can be done separately, this PR already big enough
Retry limit. I think ultimately, given how long running this will be, it's not super necessary to have a retry limit, particularly for use with the witnessers. It may become useful to introduce this for some other case, but for now, I don't think it's necessary.

kylezs · 2023-01-30T14:40:25Z

engine/src/witnesser/epoch_witnesser.rs

@@ -61,7 +62,7 @@ where
 				)> = None;

 				loop {
-					let epoch_start = epoch_start_receiver.recv().await?;
+					let epoch_start = epoch_start_receiver.recv().await.expect("Sender closed");


Not expecting here is quite annoying to solve, and given panicking here shouldn't happen, and if it did it'd be a symptom of another problem, I figure it's pretty reasonable to just panic.

engine/src/common.rs

dandanlen

Will take a closer look tomorrow.

engine/src/common.rs

dandanlen · 2023-01-30T17:35:28Z

engine/src/common.rs

+				}
+			}
+			assert_eq!(my_state, 5);
+			assert_eq!(counter, end_number - my_state);


I think this test is a bit convoluted - can be clarified a bit.

For example, it's not clear what these assertions are testing or why these numbers matter. Maybe just a naming thing, for example I'm guessing the number of restarts is significant here, so maybe something should be called restart_count and something else should be called iteration_count.

simplified so the counting isn't important (which it isn't), just tracks the restarts and then the final success now

There are no more assertions - is this intentional?

The assertions are implicit. We'd be adding an assert assert_eq!(restart_count, 6) right after we have an if statement that says it's so. And we know this statement is the one that returns since we panic!() at the end.

The restart_count also cannot reach 6 unless it does restart, which requires hitting the error block 5 times, and there is only the one error case.

Effectively, the only way this test can hit the Ok() case, is if the restart code works.

Ok - I made this point to Roy re. some of the swapping tests - we should try to keep tests as simple as the can be. In this case, if we want to test that it restarts, there's no need for the for loop counting to 10, and probably no need to restart six times either - if it restarts once, it will restart again.

dandanlen · 2023-01-31T13:18:25Z

LGTM.

I feel like having to override the actual error with the state is unfortunate, it means we lose any error context. I still think the best option is to convince the compiler of the correct lifetimes so that we don't need to return the state at all, but I can see how this isn't possible currently.

What if we use a custom return wrapper like this:

pub struct TaskRecovery<S, E>{ recovery_state: Option<S>, reason: E }

impl<S, E: std::error::Error> From<E> for TaskRecovery<S, E> {
	fn from(e: E) -> Self {
		TaskRecovery { state: None, reason: e }
	}
}

And then we can use map_err / ? instead of a custom macro, then we can let the error bubble up and we handle/log it in the restart code.

This also allows us to have a (non-panic) way to return an unrecoverable error, as before.

kylezs · 2023-01-31T13:37:35Z

I agree that some of the error handling can probably be improved, but figured it can wait, also because I'm not certain on a good direction atm. Can probably generalise the task_scope a bit further to allow for some ideas like this too.

And then we can use map_err / ? instead of a custom macro,

In the cases the macros are used, it's inside a future, so we'd need to use them regardless, to get the state out of the future, since the future owns that state otherwise, we can't access inside of it. Adding a .map_err() and then returning the owned value means necessarily that the closure has captured that value, and the compiler isn't smart enough to know that you're actually returning directly after anyway (by use of the ?) which is why the macro is required. Effectively the macro expands a map_err in lifetime/compiler friendly way.

kylezs force-pushed the fix/restart-witness branch from 2b59a91 to 72e2b53 Compare January 30, 2023 10:32

feat: start_with_restart_on_failure

d56909c

kylezs force-pushed the fix/restart-witness branch from 72e2b53 to 4209604 Compare January 30, 2023 10:53

kylezs mentioned this pull request Jan 30, 2023

WS for subscriptions only, HTTP for requests #2781

Closed

kylezs added 7 commits January 30, 2023 12:14

feat: start witnessers with restart

dda7344

feat: retry witnessers on failure

2f3425a

chore: thanks clippy :)

8add6a3

fix: some rebase issues

fdf1870

refactor: move chain data witnesser into the start method

2650483

chore: use struct for named fields

e2b1700

refactor: factor out receiver returns

2303cae

kylezs force-pushed the fix/restart-witness branch from 4209604 to 2303cae Compare January 30, 2023 13:18

kylezs added 3 commits January 30, 2023 14:21

chore: remove unnecessary match

3c36217

chore: inline Erc20witnessers

5cea8b0

doc: doc comment for

2eca0b2

kylezs force-pushed the fix/restart-witness branch 2 times, most recently from a2490b9 to 8a935e2 Compare January 30, 2023 14:23

kylezs commented Jan 30, 2023

View reviewed changes

engine/src/common.rs Outdated Show resolved Hide resolved

kylezs added 6 commits January 30, 2023 16:19

refactor: generalise out a try_or_throw

b1d61f6

fix: improve error handling in witnessing::start

a9a8e91

chore: remove unnecessary match

213db86

chore: remove unnecesary log

b355348

chore: inline create_witnessers call

9ed3bf5

chore: inline EthBroadcaster

75e7c33

kylezs force-pushed the fix/restart-witness branch from f2737d7 to 53141ae Compare January 30, 2023 15:21

chore: move start_and_restart_on_failure to common

5899ab3

kylezs force-pushed the fix/restart-witness branch from 53141ae to 5899ab3 Compare January 30, 2023 15:23

kylezs requested a review from dandanlen January 30, 2023 15:24

kylezs marked this pull request as ready for review January 30, 2023 15:24

kylezs requested a review from AlastairHolmes January 30, 2023 15:26

doc: doc comment for try_or_throw

9acca0b

dandanlen reviewed Jan 30, 2023

View reviewed changes

kylezs added 2 commits January 31, 2023 09:31

refactor: remove need for Option static state

d045460

test: simplify test

9ff7bd6

dandanlen approved these changes Jan 31, 2023

View reviewed changes

kylezs merged commit 127c7d6 into main Jan 31, 2023

kylezs deleted the fix/restart-witness branch January 31, 2023 15:36

kylezs linked an issue Jan 31, 2023 that may be closed by this pull request

Websocket RPC client should automatically reconnect if becomes broken #2640

Closed

linear bot mentioned this pull request May 29, 2023

Refactor: Use WS for Subscriptions, HTTP for Requests #3302

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: restart ETH witnessers on failure #2791

feat: restart ETH witnessers on failure #2791

kylezs commented Jan 30, 2023 •

edited

Loading

kylezs Jan 30, 2023

dandanlen left a comment

dandanlen Jan 30, 2023

kylezs Jan 31, 2023 •

edited

Loading

dandanlen Jan 31, 2023

kylezs Jan 31, 2023

kylezs Jan 31, 2023

dandanlen Jan 31, 2023

dandanlen commented Jan 31, 2023

kylezs commented Jan 31, 2023 •

edited

Loading

feat: restart ETH witnessers on failure #2791

feat: restart ETH witnessers on failure #2791

Conversation

kylezs commented Jan 30, 2023 • edited Loading

Out of scope

kylezs Jan 30, 2023

Choose a reason for hiding this comment

dandanlen left a comment

Choose a reason for hiding this comment

dandanlen Jan 30, 2023

Choose a reason for hiding this comment

kylezs Jan 31, 2023 • edited Loading

Choose a reason for hiding this comment

dandanlen Jan 31, 2023

Choose a reason for hiding this comment

kylezs Jan 31, 2023

Choose a reason for hiding this comment

kylezs Jan 31, 2023

Choose a reason for hiding this comment

dandanlen Jan 31, 2023

Choose a reason for hiding this comment

dandanlen commented Jan 31, 2023

kylezs commented Jan 31, 2023 • edited Loading

kylezs commented Jan 30, 2023 •

edited

Loading

kylezs Jan 31, 2023 •

edited

Loading

kylezs commented Jan 31, 2023 •

edited

Loading