-
Notifications
You must be signed in to change notification settings - Fork 955
Sync cleanups #8230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync cleanups #8230
Conversation
dapplion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just two suggestions to document behaviour
| if !*exceeded_retries { | ||
| // Set the batch back to `AwaitingDownload` before retrying. | ||
| // This is to ensure that the batch doesn't get stuck in `Downloading` state. | ||
| batch.download_failed(None)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check the returned outcome here?
What happens if it returns Ok(BatchState::Failed)?
Looking at the code it looks like we may hit this?
lighthouse/beacon_node/network/src/sync/range_sync/batch.rs
Lines 344 to 349 in fb77ce9
| other => { | |
| self.state = other; | |
| Err(WrongState(format!( | |
| "Starting download for batch in wrong state {:?}", | |
| self.state | |
| ))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I realised I was also incrementing the batch failed count when we shouldn't be as we want to keep the batch retries separate from the column retries in the original retry PR.
Fixed in ac8adb4 . Please take a look
| // | ||
| // DataColumn retries has a retry limit so calling `downloading_to_awaiting_download` | ||
| // is safe. | ||
| if let BatchOperationOutcome::Failed { blacklist } = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BatchOperationOutcome::Failed is no longer reachable after changing the call to downloading_to_awaiting_download.
Also i think the function name conveys clearly what it does, but not its intended usage, but I also struggle to come up with something better.
batch.download_failed_skip_attempt_count seems a bit too wordy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think handling BatchOperationOutcome::Failed is a good idea though.
jimmygchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too! Thanks for the cleanup
|
This pull request has been removed from the queue for the following reason: The merge conditions cannot be satisfied due to failing checks: You can check the last failing draft PR here: #8242. You may have to fix your CI before adding the pull request to the queue again. |
|
@mergify requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
|
This pull request has been removed from the queue for the following reason: The merge conditions cannot be satisfied due to failing checks: You can check the last failing draft PR here: #8243. You may have to fix your CI before adding the pull request to the queue again. |
|
@mergify requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
|
This pull request has been removed from the queue for the following reason: The merge conditions cannot be satisfied due to failing checks: You can check the last failing draft PR here: #8244. You may have to fix your CI before adding the pull request to the queue again. |
|
@mergify requeue |
✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically |
N/A 1. In the batch retry logic, we were failing to set the batch state to `AwaitingDownload` before attempting a retry. This PR sets it to `AwaitingDownload` before the retry and sets it back to `Downloading` if the retry suceeded in sending out a request 2. Remove all peer scoring logic from retrying and rely on just de priorotizing the failed peer. I finally concede the point to @dapplion 😄 3. Changes `block_components_by_range_request` to accept `block_peers` and `column_peers`. This is to ensure that we use the full synced peerset for requesting columns in order to avoid splitting the column peers among multiple head chains. During forward sync, we want the block peers to be the peers from the syncing chain and column peers to be all synced peers from the peerdb. Also, fixes a typo and calls `attempt_send_awaiting_download_batches` from more places Co-Authored-By: Pawan Dhananjay <[email protected]>
Issue Addressed
N/A
Proposed Changes
AwaitingDownloadbefore attempting a retry. This PR sets it toAwaitingDownloadbefore the retry and sets it back toDownloadingif the retry suceeded in sending out a requestblock_components_by_range_requestto acceptblock_peersandcolumn_peers. This is to ensure that we use the full synced peerset for requesting columns in order to avoid splitting the column peers among multiple head chains. During forward sync, we want the block peers to be the peers from the syncing chain and column peers to be all synced peers from the peerdb.Also, fixes a typo and calls
attempt_send_awaiting_download_batchesfrom more places