Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[narwhal] fix unlocking resources in header_waiter when error #5954

Merged
merged 1 commit into from
Nov 10, 2022

Conversation

akichidis
Copy link
Contributor

@akichidis akichidis commented Nov 8, 2022

This PR is fixing an undesired behaviour in header_waiter. When a request fails (ex fetch batches or fetch parents) we simply warn but we do not clear the internal maps. So any next attempt to fetch those will simply get ignored. We saw a few cases in restartability where this behaviour ended up in loss of liveness.

Now, this component is not only having this issue and there are several things that should be fixed (ex the way we perform the clean ups, no retries which again can lead to loss of liveness). I am not performing any further refactoring as I know that @aschran is working to remove completely, but I prefer to have something now so could fix at least a known problem.

@vercel
Copy link

vercel bot commented Nov 8, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
explorer ✅ Ready (Inspect) Visit Preview Nov 8, 2022 at 11:09PM (UTC)

@akichidis akichidis marked this pull request as ready for review November 9, 2022 10:37
self.cleanup_pending_requests(&header);

// Ok to drop the header if core is overloaded.
let _ = self.tx_headers_loopback.try_send(header);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we used to have retries in header_waiter it was ok to not send the header back to core - and do it only in the success scenario (like here). I believe though this now could create under some circumstances a loss of liveness. I keep the behaviour as is and not re-send header back to Core in case of error so can avoid any infinite loop that could be caused here without necessary backoff. As I said on the description @aschran will completely remove this component so don't want to spend more effort now

Copy link
Contributor

@lanvidr lanvidr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@akichidis akichidis merged commit 49a2830 into main Nov 10, 2022
@akichidis akichidis deleted the fix/header-waiter-error-handling branch November 10, 2022 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants