Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
62e1854
not climate
coriolinus Mar 5, 2021
9a8c4a2
explain the intent of the bool in the unsigned phase
coriolinus Mar 5, 2021
19d7251
remove glob imports from unsigned.rs
coriolinus Mar 5, 2021
02ad14b
add OffchainRepeat parameter to ElectionProviderMultiPhase
coriolinus Mar 5, 2021
f6a0fd9
migrate core logic from #7976
coriolinus Mar 8, 2021
59a0fb4
improve formatting
coriolinus Mar 8, 2021
4612d07
fix test build failures
coriolinus Mar 8, 2021
c8ccbe9
cause test to pass
coriolinus Mar 10, 2021
7e4408f
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Mar 10, 2021
f4b50fc
Apply suggestions from code review
coriolinus Mar 16, 2021
a192ba3
collapse imports
coriolinus Mar 16, 2021
81ceca1
threshold acquired directly within try_acquire_offchain_lock
coriolinus Mar 16, 2021
9023b61
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Mar 16, 2021
449b799
add test of resubmission after interval
coriolinus Mar 17, 2021
85f715e
add test that ocw can regenerate a failed cache when resubmitting
coriolinus Mar 17, 2021
c6ecab8
ensure that OCW solutions are of the correct round
coriolinus Mar 19, 2021
ad3b786
add test of pre-dispatch round check
coriolinus Mar 19, 2021
bee456c
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Mar 19, 2021
89cf445
use `RawSolution.round` instead of redundantly externally
coriolinus Mar 22, 2021
458682b
unpack imports
coriolinus Mar 22, 2021
d7dbcc0
rename `OFFCHAIN_HEAD_DB` -> `OFFCHAIN_LOCK`
coriolinus Mar 22, 2021
b3e4950
rename `mine_call` -> `mine_checked_call`
coriolinus Mar 22, 2021
0b77268
eliminate extraneous comma
coriolinus Mar 22, 2021
576f87c
check cached call is current before submitting
coriolinus Mar 22, 2021
d0f031a
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Mar 22, 2021
681e6f0
remove unused consts introduced by bad merge.
coriolinus Mar 24, 2021
0fac518
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Mar 25, 2021
276a06a
Merge branch 'prgn-runtime-resubmit' of github.com:paritytech/substra…
coriolinus Mar 25, 2021
ea3fc29
resubmit when our solution beats queued solution
coriolinus Mar 29, 2021
cc70d37
clear call cache if solution fails to submit
coriolinus Mar 29, 2021
4b46a93
use local storage; clear on ElectionFinalized
coriolinus Mar 29, 2021
224d5e1
Revert "use local storage; clear on ElectionFinalized"
coriolinus Mar 29, 2021
c5668e9
BROKEN: try to filter local events in OCW
coriolinus Mar 29, 2021
e9305be
use local storage; simplify score fetching
coriolinus Mar 29, 2021
fe84141
fix event filter
coriolinus Mar 29, 2021
8b31773
mutate storage instead of setting it
coriolinus Mar 29, 2021
9978693
StorageValueRef::local isn't actually implemented yet
coriolinus Mar 30, 2021
ebcdeaa
add logging for some events of interest in OCW miner
coriolinus Mar 30, 2021
0236288
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Mar 30, 2021
bd7a195
rename kill_solution -> kill_ocw_solution to avoid ambiguity
coriolinus Apr 1, 2021
1c896be
defensive err instead of unreachable given unreachable code
coriolinus Apr 1, 2021
2243501
doc punctuation
coriolinus Apr 1, 2021
2b5d050
distinguish miner errors between "out of date" and "call invalid"
coriolinus Apr 1, 2021
5eb2673
downgrade info logs -> debug
coriolinus Apr 1, 2021
f12d59d
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus Apr 19, 2021
70b4f71
ensure encoded call decodes as a call
coriolinus Apr 20, 2021
9c28048
fix semantics of validation of pre-dispatch failure for wrong round
coriolinus Apr 20, 2021
3f1109b
move score check within `and_then`
coriolinus Apr 20, 2021
af2d2c6
add test that offchain workers clear their cache after election
coriolinus Apr 20, 2021
fc71d3d
ensure that bad ocw submissions are not retained for resubmission
coriolinus Apr 21, 2021
840bad3
simplify fn ocw_solution_exists
coriolinus Apr 23, 2021
8eaf481
add feasibility check when restoring cached solution
coriolinus Apr 23, 2021
f4b52f0
simplify checks again
coriolinus Apr 23, 2021
6ae73c4
Merge remote-tracking branch 'origin/master' into prgn-runtime-resubmit
coriolinus May 3, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions bin/node/runtime/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -476,6 +476,7 @@ parameter_types! {
.get(DispatchClass::Normal)
.max_extrinsic.expect("Normal extrinsics have a weight limit configured; qed")
.saturating_sub(BlockExecutionWeight::get());
pub OffchainRepeat: BlockNumber = 5;
}

impl pallet_staking::Config for Runtime {
Expand Down Expand Up @@ -537,6 +538,7 @@ impl pallet_election_provider_multi_phase::Config for Runtime {
type SignedPhase = SignedPhase;
type UnsignedPhase = UnsignedPhase;
type SolutionImprovementThreshold = MinSolutionScoreBump;
type OffchainRepeat = OffchainRepeat;
type MinerMaxIterations = MinerMaxIterations;
type MinerMaxWeight = MinerMaxWeight;
type MinerTxPriority = MultiPhaseUnsignedPriority;
Expand Down
59 changes: 40 additions & 19 deletions frame/election-provider-multi-phase/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -192,10 +192,6 @@
//! **Score based on (byte) size**: We should always prioritize small solutions over bigger ones, if
//! there is a tie. Even more harsh should be to enforce the bound of the `reduce` algorithm.
//!
//! **Offchain resubmit**: Essentially port <https://github.com/paritytech/substrate/pull/7976> to
//! this pallet as well. The `OFFCHAIN_REPEAT` also needs to become an adjustable parameter of the
//! pallet.
//!
//! **Make the number of nominators configurable from the runtime**. Remove `sp_npos_elections`
//! dependency from staking and the compact solution type. It should be generated at runtime, there
//! it should be encoded how many votes each nominators have. Essentially translate
Expand All @@ -213,7 +209,7 @@ use frame_support::{
use frame_system::{ensure_none, offchain::SendTransactionTypes};
use sp_election_providers::{ElectionDataProvider, ElectionProvider, onchain};
use sp_npos_elections::{
assignment_ratio_to_staked_normalized, is_score_better, CompactSolution, ElectionScore,
assignment_ratio_to_staked_normalized, CompactSolution, ElectionScore,
EvaluateSupport, PerThing128, Supports, VoteWeight,
};
use sp_runtime::{
Expand Down Expand Up @@ -291,8 +287,16 @@ pub enum Phase<Bn> {
Off,
/// Signed phase is open.
Signed,
/// Unsigned phase. First element is whether it is open or not, second the starting block
/// Unsigned phase. First element is whether it is active or not, second the starting block
/// number.
///
/// We do not yet check whether the unsigned phase is active or passive. The intent is for the
/// blockchain to be able to declare: "I believe that there exists an adequate signed solution,"
/// advising validators not to bother running the unsigned offchain worker.
///
/// As validator nodes are free to edit their OCW code, they could simply ignore this advisory
/// and always compute their own solution. However, by default, when the unsigned phase is passive,
/// the offchain workers will not bother running.
Unsigned((bool, Bn)),
}

Expand All @@ -303,27 +307,27 @@ impl<Bn> Default for Phase<Bn> {
}

impl<Bn: PartialEq + Eq> Phase<Bn> {
/// Weather the phase is signed or not.
/// Whether the phase is signed or not.
pub fn is_signed(&self) -> bool {
matches!(self, Phase::Signed)
}

/// Weather the phase is unsigned or not.
/// Whether the phase is unsigned or not.
pub fn is_unsigned(&self) -> bool {
matches!(self, Phase::Unsigned(_))
}

/// Weather the phase is unsigned and open or not, with specific start.
/// Whether the phase is unsigned and open or not, with specific start.
pub fn is_unsigned_open_at(&self, at: Bn) -> bool {
matches!(self, Phase::Unsigned((true, real)) if *real == at)
}

/// Weather the phase is unsigned and open or not.
/// Whether the phase is unsigned and open or not.
pub fn is_unsigned_open(&self) -> bool {
matches!(self, Phase::Unsigned((true, _)))
}

/// Weather the phase is off or not.
/// Whether the phase is off or not.
pub fn is_off(&self) -> bool {
matches!(self, Phase::Off)
}
Expand Down Expand Up @@ -514,6 +518,13 @@ pub mod pallet {
#[pallet::constant]
type SolutionImprovementThreshold: Get<Perbill>;

/// The repeat threshold of the offchain worker.
///
/// For example, if it is 5, that means that at least 5 blocks will elapse between attempts
/// to submit the worker's solution.
#[pallet::constant]
type OffchainRepeat: Get<Self::BlockNumber>;

/// The priority of the unsigned transaction submitted in the unsigned-phase
type MinerTxPriority: Get<TransactionPriority>;
/// Maximum number of iteration of balancing that will be executed in the embedded miner of
Expand Down Expand Up @@ -595,16 +606,26 @@ pub mod pallet {
}
}

fn offchain_worker(n: T::BlockNumber) {
// We only run the OCW in the first block of the unsigned phase.
if Self::current_phase().is_unsigned_open_at(n) {
match Self::try_acquire_offchain_lock(n) {
Ok(_) => {
let outcome = Self::mine_check_and_submit().map_err(ElectionError::from);
log!(info, "miner exeuction done: {:?}", outcome);
fn offchain_worker(now: T::BlockNumber) {
let threshold = T::OffchainRepeat::get();
match Self::current_phase() {
Phase::Unsigned((true, opened)) if opened == now => {
// mine a new solution, cache it, and attempt to submit it
let initial_output = Self::try_acquire_offchain_lock(now, threshold)
.and_then(|_| Self::mine_check_save_submit());
log!(info, "initial OCW output at {:?}: {:?}", now, initial_output);
}
Phase::Unsigned((true, opened)) if opened < now => {
if !<QueuedSolution<T>>::exists() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of accepting any solution there, we should consider the score of the solution.

What if a validator submit a quite low score solution, and all other validator somehow miss their submition ? (I don't know exactly in which context a validator can miss his submittion, I expect this happens when he tried to submit to fork instead of the final block chain).
In this case no not malicious validator can submit a good solution.
I'm not sure how to solve this though.

Maybe we should always compute a solution at least one time. And compare here with the on-chain one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8253 (currently draft, not yet ready for review) is intended to address exactly this issue. It lets people challenge a bad solution.

Copy link
Contributor

@gui1117 gui1117 Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I think is that if the validator has generated one solution offchain he can quickly check that his solution is better or worse that the queued one.
Considering that the validator should have generated a solution when opened == now, I think we should be able to quickly check from this generated solution.
Or if he didn't because he generated for a fork. Then we can force validator to generate at one solution per finalized chain maybe.

Currently we do:

  • generate one solution (but it might be for a fork) and try submit it.
  • if there is no solution on chain after N blocks: retry to generate a solution or take the previously generated one and try submit it.

I wonder why we don't do:

  • generate one solution (but it might be for a fork) and try submit it.
  • after N blocks: retry to generate a solution or take the previously generated one and try submit it if it has a better score than the on-chain one.

EDIT: IIRC challenge phase prevent from too bad solution to be submitted, but what if the solution is not so bad but not so good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning there is that it's wasteful. We always submit at the first opportunity. That means that the only time a continuous resubmit like that would be valuable is when all of

  • our initial solution wasn't accepted, probably due to network issues / cosmic rays
  • someone else submitted a solution which is provably fair (passes PJR check)
  • but ours is better (higher score)

This feels to me like it fits at the intersection of low-probability-of-occurring and low-benefit-if-implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm actually my thought were confused. We save the solution in a persistent offchain storage. Thus if we generate a solution for the wrong election data snapshot. Our solution is still bad.
The election data snapshot can be different between forks if the fork happened before the last changed vote.

If it happens that we generated the solution for a different snapshot I suggest that the node try to run the election again for the new snapshot and compare the solution score with the on-chain one.

I think nodes should generate a solution for all the election data snapshots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I properly understand your proposed mitigation. Are you saying that we should be using local instead of persistent, because that's documented as fork-aware?

Yes and no in case we do the option1 from Kian, we need either to make use of fork-aware storage, or store the validator/nominators sets on which we computed the solution and compare with the current one.
Or test that our solution is valid, if it is not valid then it is outdated and we need to compute one again.

If we can easily afford computing 5 solutions then option2 from Kian seems ok.

For me in option1 we should be able to know if our solution is outdated because of fork. So it is enough to mitigate this situations.
for option2 that also allow us to recompute solutions in case our score is bad for some reason. Seems easier but not strictly needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ea3fc29 implements option 1. I see that you like option 2, but I was already working on this when that message came in. If you like, I'll revert it and implement option 2 instead.

Hmm what I meant was more like this:

if !<QueuedSolution<T>>::exists || restore_solution::<T>().map_or(true, |call| call.score.is_better(QueuedSolution<T>::get().score)) {
// syntax is a bit simplified, will need a rather ugly match on call. 
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never used local, but based on the documentation it is exactly what we need. If a solution is computed in a fork, then it should not be visible once we re-org out of that fork.

What I would be curious in this case is what happens to that data? if it is not wiped, we should make sure it is cleaned at some point.

In either case, it might be a good idea to clear all OCW storage once ElectionFinalized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what I meant was more like this

I think the only difference between what you wrote there and what ea3fc29 implements is when we perform the queued solution check. Logically, it still only submits if the queued solution does not exist or if the local solution has a better score than what has been queued on-chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, it still only submits if the queued solution does not exist or if the local solution has a better score than what has been queued on-chain.

Yeah my bad, didn't check the rest of the code. Also good call to add maybe to the function name.

// as long as there is no feasible solution, keep trying to submit ours
//
// the offchain_lock prevents us from spamming submissions too often.
let resubmit_output = Self::try_acquire_offchain_lock(now, threshold)
.and_then(|_| Self::restore_or_compute_then_submit());
log!(info, "resubmit OCW output at {:?}: {:?}", now, resubmit_output);
}
Err(why) => log!(warn, "denied offchain worker: {:?}", why),
}
_ => {}
}
}

Expand Down
4 changes: 3 additions & 1 deletion frame/election-provider-multi-phase/src/mock.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ frame_support::construct_runtime!(

pub(crate) type Balance = u64;
pub(crate) type AccountId = u64;
pub(crate) type BlockNumber = u32;

sp_npos_elections::generate_solution_type!(
#[compact]
Expand Down Expand Up @@ -201,10 +202,10 @@ parameter_types! {
pub static MinerMaxIterations: u32 = 5;
pub static MinerTxPriority: u64 = 100;
pub static SolutionImprovementThreshold: Perbill = Perbill::zero();
pub static OffchainRepeat: BlockNumber = 5;
pub static MinerMaxWeight: Weight = BlockWeights::get().max_block;
pub static MockWeightInfo: bool = false;


pub static EpochLength: u64 = 30;
}

Expand Down Expand Up @@ -265,6 +266,7 @@ impl crate::Config for Runtime {
type SignedPhase = SignedPhase;
type UnsignedPhase = UnsignedPhase;
type SolutionImprovementThreshold = SolutionImprovementThreshold;
type OffchainRepeat = OffchainRepeat;
type MinerMaxIterations = MinerMaxIterations;
type MinerMaxWeight = MinerMaxWeight;
type MinerTxPriority = MinerTxPriority;
Expand Down
Loading