Staking-Async + EPMB: Migrate operations to poll#9925
Conversation
|
/cmd bench --help |
Command help: |
|
/cmd bench --pallet pallet_election_provider_multi_block pallet_election_provider_multi_block_signed pallet_election_provider_multi_block_verifier pallet_election_provider_multi_block_unsigned --runtime asset-hub-westend |
|
Command "bench --pallet pallet_election_provider_multi_block pallet_election_provider_multi_block_signed pallet_election_provider_multi_block_verifier pallet_election_provider_multi_block_unsigned --runtime asset-hub-westend" has started 🚀 See logs here |
|
Command "bench --pallet pallet_election_provider_multi_block pallet_election_provider_multi_block_signed pallet_election_provider_multi_block_verifier pallet_election_provider_multi_block_unsigned --runtime asset-hub-westend" has failed ❌! See logs here |
|
All GitHub workflows were cancelled due to failure one of the required jobs. |
|
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-9925-to-unstable2507
git worktree add --checkout .worktree/backport-9925-to-unstable2507 backport-9925-to-unstable2507
cd .worktree/backport-9925-to-unstable2507
git reset --hard HEAD^
git cherry-pick -x 05a3fb107e488378075e956186df34d04c1bf656
git push --force-with-lease |
|
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-9925-to-stable2509
git worktree add --checkout .worktree/backport-9925-to-stable2509 backport-9925-to-stable2509
cd .worktree/backport-9925-to-stable2509
git reset --hard HEAD^
git cherry-pick -x 05a3fb107e488378075e956186df34d04c1bf656
git push --force-with-lease |
|
Created backport PR for
Please cherry-pick the changes locally and resolve any conflicts. git fetch origin backport-9925-to-stable2512
git worktree add --checkout .worktree/backport-9925-to-stable2512 backport-9925-to-stable2512
cd .worktree/backport-9925-to-stable2512
git reset --hard HEAD^
git cherry-pick -x 05a3fb107e488378075e956186df34d04c1bf656
git push --force-with-lease |
This PR moves all operations related to staking elections from a
mandatory `on_initialize` with no consideration to weight, to an
optional `on_poll` with accurate, pre-execution weight checking.
## Why
* `on_initialize` is a mandatory hook. If a single parachain block
happens to contain too many of them, this block can never be authored
and imported. In solo/relay chains, this is more forgiving, as you would
have one slow block, instead of an indefinite stall.
* For example, message-queue XCMs, scheduler and MBMs might overlap with
the staking `on_initialize` in AH (unlikely, but totally possible), and
put the chain at risk.
* Contrary, `poll` hooks:
* Might not happen at all by `frame-executive` (e.g. during MBMs)
* Have access to a clear `WeigthMeter`, allowing the subject to make a
decision about whether to proceed or not.
## Functional Changes
As seen by the minimal diff in existing tests, this change, in the
absence of weight scarcity, is almost a noop. The only difference is
that the start signal from the signed pallet to the verifier pallet is
now sent at the end of the signed phase, not the beginning the signed
validation.
## Non-Functional Changes
* Now, the only pallets that call `on_poll` are `multi_block` (the
parent, not verifier and signed), and `staking_async`. This makes the
code easier to audit.
* Removes a lot of `on_initialize` terminology from weight functions
* Cleans up some stale variations in the mock setup, allowing us to skip
the signed pallet's on-initialize. This no longer makes sense as the
parent pallet is only one that calls `on_poll`.
#### Mote Test Changes
During this PR, I found multiple instances where we are forwarding the
wrong number of blocks forward in the EPMB tests. For example, to verify
a solution, 3 blocks are needed, but we are calling `roll_next` 4 times,
and the test is still passing. To harden such cases and make sure all
future tests are as explicit as possible, I have:
* Generic `fn roll_next` is made fully private and in tests should be
replaced by:
* `fn roll_next_and_phase(expect: Phase<T>)`
* `fn roll_next_and_phase_verifier(expect: Phase<T>, status: Status)`
* `fn roll_next_and_verifier(status: Status)`
* This ensures all tests to explicitly state what the expected
`Phase`/`Status` should be after moving a block forward.
All tests are updated to respect this paradigm, which has made the diff
slightly larger than I wished it to be.
### Implementation/Review Notes
#### Overall Design
The overall idea is to move all operations to a model similar to
dispatchables, where before executing `f(input) -> Result`, we have
access to a `w(input) -> Weight` that gives us the pre-execution weight.
If the pre-execution weight is good, we proceed with executing. The
execution may override the pre-execution weight to a smaller value if it
wishes so.
```rust
/// ### Type
///
/// The commonly used `(Weight, Box<dyn Fn(&mut WeightMeter)>)` should be interpreted as such:
///
/// * The `Weight` is the pre-computed worst case weight of the operation that we are going to
/// do.
/// * The `Box<dyn Fn(&mut WeightMeter)>` is the function that represents that the work that
/// will at most consume the said amount of weight. While executing, it will alter the given
/// weight meter to consume the actual weight used. Indeed, the weight that is registered in
/// the `WeightMeter` must never be more than the `Weight` returned as the first item of the
/// tuple.
///
/// In essence, the caller must:
///
/// 1. given an existing `meter`, receive `(worst_weight, exec)`
/// 2. ensure `meter` can consume up to `worst_weight`.
/// 3. if so, call `exec(meter)`, knowing `meter` will accumulate at most `worst_weight` extra.
fn per_block_exec(current_phase: Phase<T>) -> (Weight, Box<dyn Fn(&mut WeightMeter)>) {
...
}
```
#### Export Weight
Through this PR, I realized that we previously were never registering
the weight of the export process. This is because the export is managed
by staking pallet, and previously it had no way to know how much the
weight of each export step is.
Now, we alter the `ElectionProvider::status` interface such that not
only we signal if we are ready or not, but also we signal _we are ready,
and this is the weight of the next `elect`_.
```rust
fn status() -> Result<Option<Weight>, ()> {
match <CurrentPhase<T>>::get() {
// we're not doing anything.
Phase::Off => Err(()),
// we're doing sth but not ready.
Phase::Signed(_) |
Phase::SignedValidation(_) |
Phase::Unsigned(_) |
Phase::Snapshot(_) |
Phase::Emergency => Ok(None),
// we're ready, and this is the weight of the next step
Phase::Done => Ok(Some(T::WeightInfo::export_non_terminal())),
Phase::Export(p) =>
if p.is_zero() {
Ok(Some(T::WeightInfo::export_terminal()))
} else {
Ok(Some(T::WeightInfo::export_non_terminal()))
},
}
}
```
## Integration
The only breaking change of this PR is:
```
impl multi_block::Config for Runtime {
// ..
type Signed = multi_block_signed::Pallet<Self>
}
```
While not mandatory, the fellowship runtimes should use the new
`check_all_weights` function to test the weights.
## Path To Weight Refund
The usage of `WeightMeter` is intentional here to pave the way to
reclaiming the weight in a subsequent PR. It will look like this:
```diff
--- a/substrate/frame/election-provider-multi-block/src/lib.rs
+++ b/substrate/frame/election-provider-multi-block/src/lib.rs
@@ -1310,6 +1310,7 @@ impl<T: Config> Pallet<T> {
/// 2. ensure `meter` can consume up to `worst_weight`.
/// 3. if so, call `exec(meter)`, knowing `meter` will accumulate at most `worst_weight` extra.
fn per_block_exec(current_phase: Phase<T>) -> (Weight, Box<dyn Fn(&mut WeightMeter)>) {
+ use cumulus_primitives_storage_weight_reclaim::StorageWeightReclaimer;
type ExecuteFn = Box<dyn Fn(&mut WeightMeter)>;
let noop: (Weight, ExecuteFn) = (T::WeightInfo::per_block_nothing(), Box::new(|_| {}));
@@ -1318,8 +1319,9 @@ impl<T: Config> Pallet<T> {
// first snapshot
let weight = T::WeightInfo::per_block_snapshot_msp();
let exec: ExecuteFn = Box::new(move |meter: &mut WeightMeter| {
+ let mut reclaimer = StorageWeightReclaimer::new(meter);
Self::create_targets_snapshot();
- meter.consume(weight)
+ let _reclaimed = reclaimer.reclaim_with_meter(meter);
});
(weight, exec)
},
@@ -1328,8 +1330,9 @@ impl<T: Config> Pallet<T> {
// rest of the snapshot, incl last one.
let weight = T::WeightInfo::per_block_snapshot_rest();
let exec: ExecuteFn = Box::new(move |meter: &mut WeightMeter| {
+ let mut reclaimer = StorageWeightReclaimer::new(meter);
Self::create_voters_snapshot_paged(x);
- meter.consume(weight)
+ let _reclaimed = reclaimer.reclaim_with_meter(meter);
});
(weight, exec)
},
```
In short, instead of consuming the _worst case weight_, we consume the
accurate amount given to us by the weight reclaimer.
## TODO
- [x] Unit tests
- [x] Unified integration tests for weights
- [x] weight for RoundRotation is missing (export -> off phase)
- [x] Run all papi-integration tests at the end once.
- [x] Weight update / closes
#7714
- [ ] Test to ensure pallet ordering is not important anymore.
- [x] closes #8910
- [x] Upgrade block `on-init` -> `on-poll`
- [x] queue to audit
- [ ] audit done
---------
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…igned phase (#11156) on_initialize claimed on_initialize_into_signed weight (> ~3M proof_size) every block during the Signed phase, but the heavy work (loading voter snapshots) only happens once when transitioning from Snapshot into Signed. Use discriminant comparison to distinguish phase entry from same-phase ticks, falling back to on_initialize_nothing for the latter. A proper fix would be backporting #9925 into stable2512 - but this would be a bigger change not compatible with the tight timeline of 2.1.0 release on Polkadot/Kusama. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…igned phase (paritytech#11156) on_initialize claimed on_initialize_into_signed weight (> ~3M proof_size) every block during the Signed phase, but the heavy work (loading voter snapshots) only happens once when transitioning from Snapshot into Signed. Use discriminant comparison to distinguish phase entry from same-phase ticks, falling back to on_initialize_nothing for the latter. A proper fix would be backporting paritytech#9925 into stable2512 - but this would be a bigger change not compatible with the tight timeline of 2.1.0 release on Polkadot/Kusama. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR moves all operations related to staking elections from a mandatory
on_initializewith no consideration to weight, to an optionalon_pollwith accurate, pre-execution weight checking.Why
on_initializeis a mandatory hook. If a single parachain block happens to contain too many of them, this block can never be authored and imported. In solo/relay chains, this is more forgiving, as you would have one slow block, instead of an indefinite stall.on_initializein AH (unlikely, but totally possible), and put the chain at risk.pollhooks:frame-executive(e.g. during MBMs)WeigthMeter, allowing the subject to make a decision about whether to proceed or not.Functional Changes
As seen by the minimal diff in existing tests, this change, in the absence of weight scarcity, is almost a noop. The only difference is that the start signal from the signed pallet to the verifier pallet is now sent at the end of the signed phase, not the beginning the signed validation.
Non-Functional Changes
on_pollaremulti_block(the parent, not verifier and signed), andstaking_async. This makes the code easier to audit.on_initializeterminology from weight functionson_poll.Mote Test Changes
During this PR, I found multiple instances where we are forwarding the wrong number of blocks forward in the EPMB tests. For example, to verify a solution, 3 blocks are needed, but we are calling
roll_next4 times, and the test is still passing. To harden such cases and make sure all future tests are as explicit as possible, I have:fn roll_nextis made fully private and in tests should be replaced by:fn roll_next_and_phase(expect: Phase<T>)fn roll_next_and_phase_verifier(expect: Phase<T>, status: Status)fn roll_next_and_verifier(status: Status)Phase/Statusshould be after moving a block forward.All tests are updated to respect this paradigm, which has made the diff slightly larger than I wished it to be.
Implementation/Review Notes
Overall Design
The overall idea is to move all operations to a model similar to dispatchables, where before executing
f(input) -> Result, we have access to aw(input) -> Weightthat gives us the pre-execution weight. If the pre-execution weight is good, we proceed with executing. The execution may override the pre-execution weight to a smaller value if it wishes so.Export Weight
Through this PR, I realized that we previously were never registering the weight of the export process. This is because the export is managed by staking pallet, and previously it had no way to know how much the weight of each export step is.
Now, we alter the
ElectionProvider::statusinterface such that not only we signal if we are ready or not, but also we signal we are ready, and this is the weight of the nextelect.Integration
The only breaking change of this PR is:
While not mandatory, the fellowship runtimes should use the new
check_all_weightsfunction to test the weights.Path To Weight Refund
The usage of
WeightMeteris intentional here to pave the way to reclaiming the weight in a subsequent PR. It will look like this:In short, instead of consuming the worst case weight, we consume the accurate amount given to us by the weight reclaimer.
TODO
on-init->on-poll