Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an explicit delay for QUALITY phase #785

Closed
Tracked by #792
masih opened this issue Dec 11, 2024 · 1 comment · Fixed by #805
Closed
Tracked by #792

Introduce an explicit delay for QUALITY phase #785

masih opened this issue Dec 11, 2024 · 1 comment · Fixed by #805
Assignees

Comments

@masih
Copy link
Member

masih commented Dec 11, 2024

Capturing what was discussed in 11-12-2024 standup. Below is a summary of an idea originally proposed by @Kubuxu:

The duration of QUALITY phase is directly governed by delta. This phase will await the maximum length of timeout (or proceeds if there is strong quorum for the proposal).

The QUALITY phase is also very important to the progress velocity of an instance: insufficiently propagated QUALITY messages will lead to PREPARE for base, and most likely additional rounds. This behaviour was observed repeatedly in mainnet testing, specially during bootstrap phase, which is less than ideal. Because in bootstrap phase there really is no chain forkiness: the entire network is trying to decide on chains that have far lower propability of reorg than chains at steady state.

We could just increase the delta to make sure enough time is given to QUALITY messages to propagate but larger delta would result in further increase of the time it takes for an instance to terminate affecting every phase. Further, it would specifically impact CONVERGE phase, because that phase also waits for the timeout to pass regardless.

So the proposal here is to introduce a dedicated timeout for QUALITY phase, at least in the dynamic manifest for testing purposes. If proven to be successful we then proceed to propose the changes in a FIP etc.

The rationale for having this dedicated timeout in QUALITY phase only instead of both CONVERGE and QUALITY is that if QUALITY messages are sufficiently propagated and we still hit CONVERGE, then the chances are the chain is too forky and we are better off finalising on base and starting a new instance with a fresh proposal than trying to finalise on a nonbase in the current instance. Therefore, the chances are a faster CONVERGE and the start of a fresh instance results in higher overall progress velocity compared to delaying CONVERGE.

Of course, we can test this thesis at scale by introducing delay for both QUALITY and CONVERGE.

@BigLep
Copy link
Member

BigLep commented Dec 18, 2024

2024-12-18 standup notes:

  • agreed it's more elegant if it it's a multiplier. We'll do a float multiplier.
  • this will be exposed for passive testing (will be in the manifest)

@BigLep BigLep moved this from Todo to In progress in F3 Dec 18, 2024
@Kubuxu Kubuxu linked a pull request Dec 19, 2024 that will close this issue
@BigLep BigLep moved this from In progress to In review in F3 Dec 19, 2024
@github-project-automation github-project-automation bot moved this from In review to Done in F3 Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants