-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: restart ETH witnessers on failure #2791
Conversation
2b59a91
to
72e2b53
Compare
72e2b53
to
4209604
Compare
4209604
to
2303cae
Compare
a2490b9
to
8a935e2
Compare
@@ -61,7 +62,7 @@ where | |||
)> = None; | |||
|
|||
loop { | |||
let epoch_start = epoch_start_receiver.recv().await?; | |||
let epoch_start = epoch_start_receiver.recv().await.expect("Sender closed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not expecting here is quite annoying to solve, and given panicking here shouldn't happen, and if it did it'd be a symptom of another problem, I figure it's pretty reasonable to just panic.
f2737d7
to
53141ae
Compare
53141ae
to
5899ab3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a closer look tomorrow.
engine/src/common.rs
Outdated
} | ||
} | ||
assert_eq!(my_state, 5); | ||
assert_eq!(counter, end_number - my_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is a bit convoluted - can be clarified a bit.
For example, it's not clear what these assertions are testing or why these numbers matter. Maybe just a naming thing, for example I'm guessing the number of restarts is significant here, so maybe something should be called restart_count
and something else should be called iteration_count
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified so the counting isn't important (which it isn't), just tracks the restarts and then the final success now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no more assertions - is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertions are implicit. We'd be adding an assert assert_eq!(restart_count, 6)
right after we have an if statement that says it's so. And we know this statement is the one that returns since we panic!() at the end.
The restart_count also cannot reach 6 unless it does restart, which requires hitting the error block 5 times, and there is only the one error case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Effectively, the only way this test can hit the Ok() case, is if the restart code works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I made this point to Roy re. some of the swapping tests - we should try to keep tests as simple as the can be. In this case, if we want to test that it restarts, there's no need for the for loop counting to 10, and probably no need to restart six times either - if it restarts once, it will restart again.
LGTM. I feel like having to override the actual error with the state is unfortunate, it means we lose any error context. I still think the best option is to convince the compiler of the correct lifetimes so that we don't need to return the state at all, but I can see how this isn't possible currently. What if we use a custom return wrapper like this: pub struct TaskRecovery<S, E>{ recovery_state: Option<S>, reason: E }
impl<S, E: std::error::Error> From<E> for TaskRecovery<S, E> {
fn from(e: E) -> Self {
TaskRecovery { state: None, reason: e }
}
} And then we can use This also allows us to have a (non-panic) way to return an unrecoverable error, as before. |
I agree that some of the error handling can probably be improved, but figured it can wait, also because I'm not certain on a good direction atm. Can probably generalise the task_scope a bit further to allow for some ideas like this too.
In the cases the macros are used, it's inside a future, so we'd need to use them regardless, to get the state out of the future, since the future owns that state otherwise, we can't access inside of it. Adding a |
On any failure this will restart the ETH witnessers. This includes creating a fresh ETH WS client, which should resolve most, if not all of the WS connection issues.
Can reuse a lot of code and concepts to do the same for DOT.
Most of the pain (read: fun) in this PR was borrow checker issues. But the nice part about that is, it's a little easier to verify that this works just by looking at the types, since this compiles :)
Out of scope