Staking-Async + EPMB: Migrate operations to `poll` by kianenigma · Pull Request #9925 · paritytech/polkadot-sdk

kianenigma · 2025-10-03T12:26:49Z

This PR moves all operations related to staking elections from a mandatory on_initialize with no consideration to weight, to an optional on_poll with accurate, pre-execution weight checking.

Why

on_initialize is a mandatory hook. If a single parachain block happens to contain too many of them, this block can never be authored and imported. In solo/relay chains, this is more forgiving, as you would have one slow block, instead of an indefinite stall.
For example, message-queue XCMs, scheduler and MBMs might overlap with the staking on_initialize in AH (unlikely, but totally possible), and put the chain at risk.
Contrary, poll hooks:
- Might not happen at all by frame-executive (e.g. during MBMs)
- Have access to a clear WeigthMeter, allowing the subject to make a decision about whether to proceed or not.

Functional Changes

As seen by the minimal diff in existing tests, this change, in the absence of weight scarcity, is almost a noop. The only difference is that the start signal from the signed pallet to the verifier pallet is now sent at the end of the signed phase, not the beginning the signed validation.

Non-Functional Changes

Now, the only pallets that call on_poll are multi_block (the parent, not verifier and signed), and staking_async. This makes the code easier to audit.
Removes a lot of on_initialize terminology from weight functions
Cleans up some stale variations in the mock setup, allowing us to skip the signed pallet's on-initialize. This no longer makes sense as the parent pallet is only one that calls on_poll.

Mote Test Changes

During this PR, I found multiple instances where we are forwarding the wrong number of blocks forward in the EPMB tests. For example, to verify a solution, 3 blocks are needed, but we are calling roll_next 4 times, and the test is still passing. To harden such cases and make sure all future tests are as explicit as possible, I have:

Generic fn roll_next is made fully private and in tests should be replaced by:
- fn roll_next_and_phase(expect: Phase<T>)
- fn roll_next_and_phase_verifier(expect: Phase<T>, status: Status)
- fn roll_next_and_verifier(status: Status)
This ensures all tests to explicitly state what the expected Phase/Status should be after moving a block forward.

All tests are updated to respect this paradigm, which has made the diff slightly larger than I wished it to be.

Implementation/Review Notes

Overall Design

The overall idea is to move all operations to a model similar to dispatchables, where before executing f(input) -> Result, we have access to a w(input) -> Weight that gives us the pre-execution weight. If the pre-execution weight is good, we proceed with executing. The execution may override the pre-execution weight to a smaller value if it wishes so.

/// ### Type
///
/// The commonly used `(Weight, Box<dyn Fn(&mut WeightMeter)>)` should be interpreted as such:
///
/// * The `Weight` is the pre-computed worst case weight of the operation that we are going to
///   do.
/// * The `Box<dyn Fn(&mut WeightMeter)>` is the function that represents that the work that
///   will at most consume the said amount of weight. While executing, it will alter the given
///   weight meter to consume the actual weight used. Indeed, the weight that is registered in
///   the `WeightMeter` must never be more than the `Weight` returned as the first item of the
///   tuple.
///
/// In essence, the caller must:
///
/// 1. given an existing `meter`, receive `(worst_weight, exec)`
/// 2. ensure `meter` can consume up to `worst_weight`.
/// 3. if so, call `exec(meter)`, knowing `meter` will accumulate at most `worst_weight` extra.
fn per_block_exec(current_phase: Phase<T>) -> (Weight, Box<dyn Fn(&mut WeightMeter)>) {
    ... 
}

Export Weight

Through this PR, I realized that we previously were never registering the weight of the export process. This is because the export is managed by staking pallet, and previously it had no way to know how much the weight of each export step is.

Now, we alter the ElectionProvider::status interface such that not only we signal if we are ready or not, but also we signal we are ready, and this is the weight of the next elect.

fn status() -> Result<Option<Weight>, ()> {
	match <CurrentPhase<T>>::get() {
		// we're not doing anything.
		Phase::Off => Err(()),

		// we're doing sth but not ready.
		Phase::Signed(_) |
		Phase::SignedValidation(_) |
		Phase::Unsigned(_) |
		Phase::Snapshot(_) |
		Phase::Emergency => Ok(None),

		// we're ready, and this is the weight of the next step
		Phase::Done => Ok(Some(T::WeightInfo::export_non_terminal())),
		Phase::Export(p) =>
			if p.is_zero() {
				Ok(Some(T::WeightInfo::export_terminal()))
			} else {
				Ok(Some(T::WeightInfo::export_non_terminal()))
			},
	}
}

Integration

The only breaking change of this PR is:

impl multi_block::Config for Runtime {
    // .. 
    type Signed = multi_block_signed::Pallet<Self>
}

While not mandatory, the fellowship runtimes should use the new check_all_weights function to test the weights.

Path To Weight Refund

The usage of WeightMeter is intentional here to pave the way to reclaiming the weight in a subsequent PR. It will look like this:

--- a/substrate/frame/election-provider-multi-block/src/lib.rs
+++ b/substrate/frame/election-provider-multi-block/src/lib.rs
@@ -1310,6 +1310,7 @@ impl<T: Config> Pallet<T> {
 	/// 2. ensure `meter` can consume up to `worst_weight`.
 	/// 3. if so, call `exec(meter)`, knowing `meter` will accumulate at most `worst_weight` extra.
 	fn per_block_exec(current_phase: Phase<T>) -> (Weight, Box<dyn Fn(&mut WeightMeter)>) {
+		use cumulus_primitives_storage_weight_reclaim::StorageWeightReclaimer;
 		type ExecuteFn = Box<dyn Fn(&mut WeightMeter)>;
 		let noop: (Weight, ExecuteFn) = (T::WeightInfo::per_block_nothing(), Box::new(|_| {}));
 
@@ -1318,8 +1319,9 @@ impl<T: Config> Pallet<T> {
 				// first snapshot
 				let weight = T::WeightInfo::per_block_snapshot_msp();
 				let exec: ExecuteFn = Box::new(move |meter: &mut WeightMeter| {
+					let mut reclaimer = StorageWeightReclaimer::new(meter);
 					Self::create_targets_snapshot();
-					meter.consume(weight)
+					let _reclaimed = reclaimer.reclaim_with_meter(meter);
 				});
 				(weight, exec)
 			},
@@ -1328,8 +1330,9 @@ impl<T: Config> Pallet<T> {
 				// rest of the snapshot, incl last one.
 				let weight = T::WeightInfo::per_block_snapshot_rest();
 				let exec: ExecuteFn = Box::new(move |meter: &mut WeightMeter| {
+					let mut reclaimer = StorageWeightReclaimer::new(meter);
 					Self::create_voters_snapshot_paged(x);
-					meter.consume(weight)
+					let _reclaimed = reclaimer.reclaim_with_meter(meter);
 				});
 				(weight, exec)
 			},

In short, instead of consuming the worst case weight, we consume the accurate amount given to us by the weight reclaimer.

TODO

…k signed shit

…epmb-poll

kianenigma · 2025-10-06T10:46:29Z

/cmd bench --help

github-actions · 2025-10-06T10:47:05Z

Command help:

usage: /cmd bench [-h] [--quiet] [--clean] [--image IMAGE]
                  [--runtime [{dev,westend,rococo,asset-hub-westend,asset-hub-rococo,bridge-hub-rococo,bridge-hub-westend,collectives-westend,coretime-rococo,coretime-westend,glutton-westend,people-rococo,people-westend} ...]]
                  [--pallet [PALLET ...]] [--fail-fast]

options:
  -h, --help            show this help message and exit
  --quiet               Won't print start/end/failed messages in PR
  --clean               Clean up the previous bot's & author's comments in PR
  --image IMAGE         Override docker image '--image
                        docker.io/paritytech/ci-unified:latest'
  --runtime [{dev,westend,rococo,asset-hub-westend,asset-hub-rococo,bridge-hub-rococo,bridge-hub-westend,collectives-westend,coretime-rococo,coretime-westend,glutton-westend,people-rococo,people-westend} ...]
                        Runtime(s) space separated
  --pallet [PALLET ...]
                        Pallet(s) space separated
  --fail-fast           Fail fast on first failed benchmark

**Examples**:
 Runs all benchmarks 
 /cmd bench

 Runs benchmarks for pallet_balances and pallet_multisig for all runtimes which have these pallets. **--quiet** makes it to output nothing to PR but reactions
 /cmd bench --pallet pallet_balances pallet_xcm_benchmarks::generic --quiet
 
 Runs bench for all pallets for westend runtime and fails fast on first failed benchmark
 /cmd bench --runtime westend --fail-fast
 
 Does not output anything and cleans up the previous bot's & author command triggering comments in PR 
 /cmd bench --runtime westend rococo --pallet pallet_balances pallet_multisig --quiet --clean

kianenigma · 2025-10-06T10:47:57Z

/cmd bench --pallet pallet_election_provider_multi_block pallet_election_provider_multi_block_signed pallet_election_provider_multi_block_verifier pallet_election_provider_multi_block_unsigned --runtime asset-hub-westend

github-actions · 2025-10-06T10:48:42Z

Command "bench --pallet pallet_election_provider_multi_block pallet_election_provider_multi_block_signed pallet_election_provider_multi_block_verifier pallet_election_provider_multi_block_unsigned --runtime asset-hub-westend" has started 🚀 See logs here

github-actions · 2025-10-06T11:22:15Z

Command "bench --pallet pallet_election_provider_multi_block pallet_election_provider_multi_block_signed pallet_election_provider_multi_block_verifier pallet_election_provider_multi_block_unsigned --runtime asset-hub-westend" has failed ❌! See logs here

…epmb-poll

…to kiz-epmb-poll

paritytech-workflow-stopper · 2025-12-01T09:51:47Z

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/19818107671
Failed job name: test-linux-stable-no-try-runtime

paritytech-release-backport-bot · 2025-12-01T20:54:52Z

Created backport PR for unstable2507:

[unstable2507] Backport #9925 #10496 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-9925-to-unstable2507
git worktree add --checkout .worktree/backport-9925-to-unstable2507 backport-9925-to-unstable2507
cd .worktree/backport-9925-to-unstable2507
git reset --hard HEAD^
git cherry-pick -x 05a3fb107e488378075e956186df34d04c1bf656
git push --force-with-lease

paritytech-release-backport-bot · 2025-12-01T20:54:58Z

Created backport PR for stable2509:

[stable2509] Backport #9925 #10497 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-9925-to-stable2509
git worktree add --checkout .worktree/backport-9925-to-stable2509 backport-9925-to-stable2509
cd .worktree/backport-9925-to-stable2509
git reset --hard HEAD^
git cherry-pick -x 05a3fb107e488378075e956186df34d04c1bf656
git push --force-with-lease

paritytech-release-backport-bot · 2025-12-01T20:55:04Z

Created backport PR for stable2512:

[stable2512] Backport #9925 #10498 with remaining conflicts!

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin backport-9925-to-stable2512
git worktree add --checkout .worktree/backport-9925-to-stable2512 backport-9925-to-stable2512
cd .worktree/backport-9925-to-stable2512
git reset --hard HEAD^
git cherry-pick -x 05a3fb107e488378075e956186df34d04c1bf656
git push --force-with-lease

This PR moves all operations related to staking elections from a mandatory `on_initialize` with no consideration to weight, to an optional `on_poll` with accurate, pre-execution weight checking. ## Why * `on_initialize` is a mandatory hook. If a single parachain block happens to contain too many of them, this block can never be authored and imported. In solo/relay chains, this is more forgiving, as you would have one slow block, instead of an indefinite stall. * For example, message-queue XCMs, scheduler and MBMs might overlap with the staking `on_initialize` in AH (unlikely, but totally possible), and put the chain at risk. * Contrary, `poll` hooks: * Might not happen at all by `frame-executive` (e.g. during MBMs) * Have access to a clear `WeigthMeter`, allowing the subject to make a decision about whether to proceed or not. ## Functional Changes As seen by the minimal diff in existing tests, this change, in the absence of weight scarcity, is almost a noop. The only difference is that the start signal from the signed pallet to the verifier pallet is now sent at the end of the signed phase, not the beginning the signed validation. ## Non-Functional Changes * Now, the only pallets that call `on_poll` are `multi_block` (the parent, not verifier and signed), and `staking_async`. This makes the code easier to audit. * Removes a lot of `on_initialize` terminology from weight functions * Cleans up some stale variations in the mock setup, allowing us to skip the signed pallet's on-initialize. This no longer makes sense as the parent pallet is only one that calls `on_poll`. #### Mote Test Changes During this PR, I found multiple instances where we are forwarding the wrong number of blocks forward in the EPMB tests. For example, to verify a solution, 3 blocks are needed, but we are calling `roll_next` 4 times, and the test is still passing. To harden such cases and make sure all future tests are as explicit as possible, I have: * Generic `fn roll_next` is made fully private and in tests should be replaced by: * `fn roll_next_and_phase(expect: Phase<T>)` * `fn roll_next_and_phase_verifier(expect: Phase<T>, status: Status)` * `fn roll_next_and_verifier(status: Status)` * This ensures all tests to explicitly state what the expected `Phase`/`Status` should be after moving a block forward. All tests are updated to respect this paradigm, which has made the diff slightly larger than I wished it to be. ### Implementation/Review Notes #### Overall Design The overall idea is to move all operations to a model similar to dispatchables, where before executing `f(input) -> Result`, we have access to a `w(input) -> Weight` that gives us the pre-execution weight. If the pre-execution weight is good, we proceed with executing. The execution may override the pre-execution weight to a smaller value if it wishes so. ```rust /// ### Type /// /// The commonly used `(Weight, Box<dyn Fn(&mut WeightMeter)>)` should be interpreted as such: /// /// * The `Weight` is the pre-computed worst case weight of the operation that we are going to /// do. /// * The `Box<dyn Fn(&mut WeightMeter)>` is the function that represents that the work that /// will at most consume the said amount of weight. While executing, it will alter the given /// weight meter to consume the actual weight used. Indeed, the weight that is registered in /// the `WeightMeter` must never be more than the `Weight` returned as the first item of the /// tuple. /// /// In essence, the caller must: /// /// 1. given an existing `meter`, receive `(worst_weight, exec)` /// 2. ensure `meter` can consume up to `worst_weight`. /// 3. if so, call `exec(meter)`, knowing `meter` will accumulate at most `worst_weight` extra. fn per_block_exec(current_phase: Phase<T>) -> (Weight, Box<dyn Fn(&mut WeightMeter)>) { ... } ``` #### Export Weight Through this PR, I realized that we previously were never registering the weight of the export process. This is because the export is managed by staking pallet, and previously it had no way to know how much the weight of each export step is. Now, we alter the `ElectionProvider::status` interface such that not only we signal if we are ready or not, but also we signal _we are ready, and this is the weight of the next `elect`_. ```rust fn status() -> Result<Option<Weight>, ()> { match <CurrentPhase<T>>::get() { // we're not doing anything. Phase::Off => Err(()), // we're doing sth but not ready. Phase::Signed(_) | Phase::SignedValidation(_) | Phase::Unsigned(_) | Phase::Snapshot(_) | Phase::Emergency => Ok(None), // we're ready, and this is the weight of the next step Phase::Done => Ok(Some(T::WeightInfo::export_non_terminal())), Phase::Export(p) => if p.is_zero() { Ok(Some(T::WeightInfo::export_terminal())) } else { Ok(Some(T::WeightInfo::export_non_terminal())) }, } } ``` ## Integration The only breaking change of this PR is: ``` impl multi_block::Config for Runtime { // .. type Signed = multi_block_signed::Pallet<Self> } ``` While not mandatory, the fellowship runtimes should use the new `check_all_weights` function to test the weights. ## Path To Weight Refund The usage of `WeightMeter` is intentional here to pave the way to reclaiming the weight in a subsequent PR. It will look like this: ```diff --- a/substrate/frame/election-provider-multi-block/src/lib.rs +++ b/substrate/frame/election-provider-multi-block/src/lib.rs @@ -1310,6 +1310,7 @@ impl<T: Config> Pallet<T> { /// 2. ensure `meter` can consume up to `worst_weight`. /// 3. if so, call `exec(meter)`, knowing `meter` will accumulate at most `worst_weight` extra. fn per_block_exec(current_phase: Phase<T>) -> (Weight, Box<dyn Fn(&mut WeightMeter)>) { + use cumulus_primitives_storage_weight_reclaim::StorageWeightReclaimer; type ExecuteFn = Box<dyn Fn(&mut WeightMeter)>; let noop: (Weight, ExecuteFn) = (T::WeightInfo::per_block_nothing(), Box::new(|_| {})); @@ -1318,8 +1319,9 @@ impl<T: Config> Pallet<T> { // first snapshot let weight = T::WeightInfo::per_block_snapshot_msp(); let exec: ExecuteFn = Box::new(move |meter: &mut WeightMeter| { + let mut reclaimer = StorageWeightReclaimer::new(meter); Self::create_targets_snapshot(); - meter.consume(weight) + let _reclaimed = reclaimer.reclaim_with_meter(meter); }); (weight, exec) }, @@ -1328,8 +1330,9 @@ impl<T: Config> Pallet<T> { // rest of the snapshot, incl last one. let weight = T::WeightInfo::per_block_snapshot_rest(); let exec: ExecuteFn = Box::new(move |meter: &mut WeightMeter| { + let mut reclaimer = StorageWeightReclaimer::new(meter); Self::create_voters_snapshot_paged(x); - meter.consume(weight) + let _reclaimed = reclaimer.reclaim_with_meter(meter); }); (weight, exec) }, ``` In short, instead of consuming the _worst case weight_, we consume the accurate amount given to us by the weight reclaimer. ## TODO - [x] Unit tests - [x] Unified integration tests for weights - [x] weight for RoundRotation is missing (export -> off phase) - [x] Run all papi-integration tests at the end once. - [x] Weight update / closes #7714 - [ ] Test to ensure pallet ordering is not important anymore. - [x] closes #8910 - [x] Upgrade block `on-init` -> `on-poll` - [x] queue to audit - [ ] audit done --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…igned phase (#11156) on_initialize claimed on_initialize_into_signed weight (> ~3M proof_size) every block during the Signed phase, but the heavy work (loading voter snapshots) only happens once when transitioning from Snapshot into Signed. Use discriminant comparison to distinguish phase entry from same-phase ticks, falling back to on_initialize_nothing for the latter. A proper fix would be backporting #9925 into stable2512 - but this would be a bigger change not compatible with the tight timeline of 2.1.0 release on Polkadot/Kusama. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…igned phase (paritytech#11156) on_initialize claimed on_initialize_into_signed weight (> ~3M proof_size) every block during the Signed phase, but the heavy work (loading voter snapshots) only happens once when transitioning from Snapshot into Signed. Use discriminant comparison to distinguish phase entry from same-phase ticks, falling back to on_initialize_nothing for the latter. A proper fix would be backporting paritytech#9925 into stable2512 - but this would be a bigger change not compatible with the tight timeline of 2.1.0 release on Polkadot/Kusama. --------- Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kianenigma added 12 commits September 23, 2025 18:36

feature in good shape, most tests pass, except those that rely in moc…

52ad11f

…k signed shit

new combinator logic, all tests pass

2608c36

fix benchmarks too

88cb160

move to poll, rename all weights

fac1956

cleanup mock signed stuff

65a2964

staking uses poll now as well

1e4bfc2

tidy up

624e627

fix ahm tests and merge master

f74d49f

solid round of self-review

c05bf0d

Merge branch 'master' of github.com:paritytech/polkadot-sdk into kiz-…

9bcba02

…epmb-poll

remove some todos

10afe31

fmt

0c4289d

kianenigma requested a review from a team as a code owner October 3, 2025 12:26

kianenigma added T2-pallets This PR/Issue is related to a particular pallet. A4-backport-unstable2507 Pull request must be backported to the unstable2507 release branch labels Oct 3, 2025

fix build

5b712bb

sigurpol self-requested a review October 3, 2025 13:21

kianenigma added 2 commits October 6, 2025 11:42

add unit test

7bda95e

Merge branch 'master' of github.com:paritytech/polkadot-sdk into kiz-…

fe82b86

…epmb-poll