Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@coriolinus
Copy link
Contributor

@coriolinus coriolinus commented Mar 8, 2021

Migration of #7976 to multi-phase-elections.

Fix issues such as paritytech/stakingops#130

Our OCW runner in staking is too conservative now. It only runs once at the beginning of the election window, and only attempts to submit once. If for any reason (e.g. secondary/primary slot overlap, fork and rollback) all validators temporarily think that a solution is submitted, they will all drop their computed solution and never try again. In essence, they will attempt to submit it, which will fail because there is already a good solution on-chain, and the process ends. In this scenario, if the submitted solution gets reverted, we are left with nothing, and thus a heavy on-chain election.

This is a minimal patch to fix the situation. OCW will compute, save and submit the solution initially, and until the end of the election window periodically try to re-submitted the saved solution, if none is stored yet.

polkadot companion: paritytech/polkadot#2648

@coriolinus coriolinus self-assigned this Mar 8, 2021
@github-actions github-actions bot added the A3-in_progress Pull request is in progress. No review needed at this stage. label Mar 8, 2021
@coriolinus coriolinus changed the title Prgn runtime resubmit Multi-phase elections solution resubmission Mar 8, 2021
@coriolinus coriolinus added B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Mar 8, 2021
@coriolinus coriolinus marked this pull request as ready for review March 10, 2021 11:30
@github-actions github-actions bot added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Mar 10, 2021
@coriolinus coriolinus added the D9-needsaudit 👮 PR contains changes to fund-managing logic that should be properly reviewed and externally audited label Mar 10, 2021
/// Save a given call into OCW storage.
fn save_solution<T: Config>(call: &Call<T>) {
let storage = StorageValueRef::persistent(&OFFCHAIN_CACHED_CALL);
storage.set(&call);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We don't care about the current value, so what benefit is there in reading it? We're not using the get-check-set pattern mentioned in the docs.

Copy link
Contributor

@kianenigma kianenigma Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7976 (comment)

let me know if it is unclear, it is an important thing to grasp, the overlapping possibility of OCWs.

Copy link
Contributor Author

@coriolinus coriolinus Mar 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very surprising that set would allow for reads of an incomplete write, even from a separate, parallel worker. I traced the function calls back a bit, and ended up in trait sp_core::offchain::Externalities. For that trait, local_storage_set requires &mut self.

Obviously, we can run OCWs in parallel, so the storage needs to be shared. I haven't traced it back exhaustively, but what I'd expect to find is something like RwLock<dyn Externalities>. Given that, an OCW which wants to read the storage has to wait until its peer with the write lock is completely finished writing, before it has a chance to read.

I believe that the atomic nature of the mutate method is to avoid this scenario:

  • OCW A: read storage Foo
  • OCW B: update storage Foo
  • OCW A: update storage Foo based on its then-current value, clobbering what B wrote

AFAICT, normal mutable aliasing rules prevent OCW B from ever reading while OCW A is in the middle of writing. If I'm right, then we really don't need mutate here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally happy to accept what you are proposing, if you actually find out that it is designed and works as you hypothesize, not based on numerous assumptions that you are making.

I am not sure about any of this myself, to be frank, and generally prefer consulting and the trusting the opinion of someone who actually does (Tomek in the case of the linked discussion). You can probably ping him if you want to clarify this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update?

(learning from the past, I think you should just change this 1 line to a 1 liner that uses mutate and call it a day and be safe and save time as well.)

Copy link
Contributor Author

@coriolinus coriolinus Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ok. I just can't find the actual implementation of offchain storage anywhere; everything seems to be abstractions that layer on abstractions. I'll just use mutate here. 8b31773

@coriolinus
Copy link
Contributor Author

All requested items except burnin have now been completed. There's some more work necessary before this can be merged--burnin and CI--but this is in a good place for review now.

@coriolinus coriolinus added the D9-needsaudit 👮 PR contains changes to fund-managing logic that should be properly reviewed and externally audited label Apr 20, 2021
let storage = StorageValueRef::persistent(&OFFCHAIN_LOCK);

let mutate_stat =
storage.mutate::<_, &'static str, _>(|maybe_head: Option<Option<T::BlockNumber>>| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sidenote about the code below:

As far as I know, we run offchain worker only on the best block, if we do reorg we don't run the offchain worker on every block in the reorg.

If this is true then we could run the offchain worker when now is less than head on forks, because if we consider another chain to be the best then we should submit to this chain, no matter what we did in a previous fork. no ?

Copy link
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@gui1117 gui1117 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@jakoblell jakoblell added D1-audited 👍 PR contains changes to fund-managing logic that has been properly reviewed and externally audited and removed D9-needsaudit 👮 PR contains changes to fund-managing logic that should be properly reviewed and externally audited labels Apr 29, 2021
@coriolinus
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented May 3, 2021

Waiting for commit status.

@ghost
Copy link

ghost commented May 3, 2021

Merge aborted: Checks failed for 6ae73c4

@coriolinus
Copy link
Contributor Author

bot merge

@ghost
Copy link

ghost commented May 3, 2021

Trying merge.

@ghost ghost merged commit 92c15dd into master May 3, 2021
@ghost ghost deleted the prgn-runtime-resubmit branch May 3, 2021 10:49
bkchr pushed a commit to paritytech/polkadot that referenced this pull request May 3, 2021
* Companion for Multi-phase elections solution resubmission

paritytech/substrate#8290

* update Substrate

Co-authored-by: parity-processbot <>
kianenigma added a commit that referenced this pull request May 15, 2021
kianenigma added a commit that referenced this pull request May 15, 2021
nazar-pc pushed a commit to autonomys/substrate that referenced this pull request Aug 8, 2021
* not climate

* explain the intent of the bool in the unsigned phase

* remove glob imports from unsigned.rs

* add OffchainRepeat parameter to ElectionProviderMultiPhase

* migrate core logic from paritytech#7976

This is a much smaller diff than that PR contained, but I think
it contains all the essentials.

* improve formatting

* fix test build failures

* cause test to pass

* Apply suggestions from code review

Co-authored-by: Kian Paimani <[email protected]>

* collapse imports

* threshold acquired directly within try_acquire_offchain_lock

* add test of resubmission after interval

* add test that ocw can regenerate a failed cache when resubmitting

* ensure that OCW solutions are of the correct round

This should help prevent stale cached solutions from persisting
past the election for which they are intended.

* add test of pre-dispatch round check

* use `RawSolution.round` instead of redundantly externally

* unpack imports

Co-authored-by: Kian Paimani <[email protected]>

* rename `OFFCHAIN_HEAD_DB` -> `OFFCHAIN_LOCK`

* rename `mine_call` -> `mine_checked_call`

* eliminate extraneous comma

* check cached call is current before submitting

* remove unused consts introduced by bad merge.

Co-authored-by: Guillaume Thiolliere <[email protected]>

* resubmit when our solution beats queued solution

* clear call cache if solution fails to submit

* use local storage; clear on ElectionFinalized

* Revert "use local storage; clear on ElectionFinalized"

This reverts commit 4b46a93.

* BROKEN: try to filter local events in OCW

* use local storage; simplify score fetching

* fix event filter

* mutate storage instead of setting it

* StorageValueRef::local isn't actually implemented yet

* add logging for some events of interest in OCW miner

* rename kill_solution -> kill_ocw_solution to avoid ambiguity

* defensive err instead of unreachable given unreachable code

* doc punctuation

Co-authored-by: Kian Paimani <[email protected]>

* distinguish miner errors between "out of date" and "call invalid"

* downgrade info logs -> debug

* ensure encoded call decodes as a call

* fix semantics of validation of pre-dispatch failure for wrong round

* move score check within `and_then`

* add test that offchain workers clear their cache after election

* ensure that bad ocw submissions are not retained for resubmission

* simplify fn ocw_solution_exists

* add feasibility check when restoring cached solution

should address https://github.com/paritytech/substrate/pull/8290/files#r617533358

restructures how the checks are sequenced, which simplifies legibility.

* simplify checks again

Co-authored-by: Kian Paimani <[email protected]>
Co-authored-by: Guillaume Thiolliere <[email protected]>
ggwpez pushed a commit to ggwpez/runtimes that referenced this pull request Mar 10, 2023
* Companion for Multi-phase elections solution resubmission

paritytech/substrate#8290

* update Substrate

Co-authored-by: parity-processbot <>
ggwpez pushed a commit to ggwpez/runtimes that referenced this pull request Jul 13, 2023
* Companion for Multi-phase elections solution resubmission

paritytech/substrate#8290

* update Substrate

Co-authored-by: parity-processbot <>
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D1-audited 👍 PR contains changes to fund-managing logic that has been properly reviewed and externally audited

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants