chore(sequencer-relayer)!: minimize resubmissions to Celestia#1234
Merged
chore(sequencer-relayer)!: minimize resubmissions to Celestia#1234
Conversation
…estia-resubmissions
SuperFluffy
approved these changes
Jul 23, 2024
Contributor
SuperFluffy
left a comment
There was a problem hiding this comment.
This looks great and I don't see any outright blockers, just a bunch of requests for stylistic changes and tightening naming (to make the use of types and states clearer on a first read).
A big request I have is to not omit the use of context in error handling. While an author might have reason to avoid the context (for example, because ample information is provided in the called function), it's not at all clear for a reviewer. :)
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
crates/astria-sequencer-relayer/src/relayer/celestia_client/mod.rs
Outdated
Show resolved
Hide resolved
quasystaty1
approved these changes
Jul 24, 2024
Contributor
quasystaty1
left a comment
There was a problem hiding this comment.
LGTM, tested locally
joroshiba
reviewed
Jul 24, 2024
bharath-123
pushed a commit
that referenced
this pull request
Jul 25, 2024
## Summary The relayer has been updated to avoid resubmitting the same data to Celestia on restart or on a timeout of the `BroadcastTx` gRPC. ## Background Currently the relayer will likely resubmit the same data when it restarts, since the majority of the time spent in the submit loop is waiting for confirmation of the `BlobTx` having been stored. While in this state, the on-disk files aren't updated to indicate that we're waiting for confirmation. Hence if we restart, the new session begins by submitting all data from the last-confirmed point, i.e. the same data which was already likely successfully submitted in the final attempt of the previous session. A similar situation happens if we timeout while waiting for the `BroadcastTx` response - the retry loop will attempt to resubmit the same data without first checking if the previous attempt succeeded. ## Changes * Replaced the two state files (presubmit and postsubmit) with a single one (submission-state) and similarly for their respective env vars. See [the updated spec](https://github.com/astriaorg/astria/blob/c110bdbf7d0fd05fba04775d07d046c37bbd7372/specs/sequencer-relayer.md#submission-state-file) for further details. * The `submission` module was heavily changed: there are now two primary state structs `StartedSubmission` and `PreparedSubmission`, between which the blob submitter toggles during normal operation. There is also `FreshSubmission` which is only relevant at startup, and an enum covering these three (`SubmissionStateAtStartup`) which is also only used during startup. Finally, the `State` enum is an ephemeral object only used to read/write the relevant state from/to disk. * `BlobSubmitter::run` was modified to try to confirm the last submission attempt from the previous session if the state file indicated the relayer exited while in `prepared` state. If that submission is confirmed (the most likely outcome), then the sequencer blocks in that submission are simply skipped in the write loop. (We could try to avoid even fetching these sequencer blocks, but that would be a significantly more pervasive change, and is probably not worth the complexity). * The relayer's celestia client was changed to split `try_submit` into `try_prepare` and `try_submit` so that the hash of the prepared `BlobTx` can be returned from `try_prepare` to be recorded in the state file before the transaction is broadcast to the Celestia app. * `submit_with_retry` was updated to check for a broadcast timeout error in the previous attempt, and in that case, attempts to just confirm that submission rather than automatically resubmitting the same data. ## Testing * Unit tests for the new `submission` types. * Black box test `later_height_in_state_leads_to_expected_relay` was updated. * Manually observed expected behaviour against a locally-running sequencer and Celestia app. ## Breaking Changelist * Removed `ASTRIA_SEQUENCER_RELAYER_PRE_SUBMIT_PATH` and `ASTRIA_SEQUENCER_RELAYER_POST_SUBMIT_PATH` config env vars. * Added `ASTRIA_SEQUENCER_RELAYER_SUBMISSION_STATE_PATH` config env var. ## Related Issues Closes #1200.
steezeburger
added a commit
that referenced
this pull request
Jul 29, 2024
* main: release: cut bridge withdrawer release (#1303) release: version cuts for dusk-9 (#1299) chore(core): remove ed25519_consensus from public API (#1277) chore: remove spurious entry in gitignore (#1276) chore(chart): Update EVM-Rollup Geth devTag (#1300) fix(proto)!: Change execution API to use primitive RollupId (#1291) refactor(core, proto)!: define bridge memos in proto (#1285) chore(sequencer-relayer)!: minimize resubmissions to Celestia (#1234)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The relayer has been updated to avoid resubmitting the same data to Celestia on restart or on a timeout of the
BroadcastTxgRPC.Background
Currently the relayer will likely resubmit the same data when it restarts, since the majority of the time spent in the submit loop is waiting for confirmation of the
BlobTxhaving been stored. While in this state, the on-disk files aren't updated to indicate that we're waiting for confirmation. Hence if we restart, the new session begins by submitting all data from the last-confirmed point, i.e. the same data which was already likely successfully submitted in the final attempt of the previous session.A similar situation happens if we timeout while waiting for the
BroadcastTxresponse - the retry loop will attempt to resubmit the same data without first checking if the previous attempt succeeded.Changes
submissionmodule was heavily changed: there are now two primary state structsStartedSubmissionandPreparedSubmission, between which the blob submitter toggles during normal operation. There is alsoFreshSubmissionwhich is only relevant at startup, and an enum covering these three (SubmissionStateAtStartup) which is also only used during startup. Finally, theStateenum is an ephemeral object only used to read/write the relevant state from/to disk.BlobSubmitter::runwas modified to try to confirm the last submission attempt from the previous session if the state file indicated the relayer exited while inpreparedstate. If that submission is confirmed (the most likely outcome), then the sequencer blocks in that submission are simply skipped in the write loop. (We could try to avoid even fetching these sequencer blocks, but that would be a significantly more pervasive change, and is probably not worth the complexity).try_submitintotry_prepareandtry_submitso that the hash of the preparedBlobTxcan be returned fromtry_prepareto be recorded in the state file before the transaction is broadcast to the Celestia app.submit_with_retrywas updated to check for a broadcast timeout error in the previous attempt, and in that case, attempts to just confirm that submission rather than automatically resubmitting the same data.Testing
submissiontypes.later_height_in_state_leads_to_expected_relaywas updated.Breaking Changelist
ASTRIA_SEQUENCER_RELAYER_PRE_SUBMIT_PATHandASTRIA_SEQUENCER_RELAYER_POST_SUBMIT_PATHconfig env vars.ASTRIA_SEQUENCER_RELAYER_SUBMISSION_STATE_PATHconfig env var.Related Issues
Closes #1200.