fix: slot-based collator shuts down immediately after init by sigurpol · Pull Request #11628 · paritytech/polkadot-sdk

sigurpol · 2026-04-03T08:07:14Z

Fix a regression introduced by #11381, where we wrapped the slot-based collator launch in an async task that first calls wait_for_aura, then spawns the actual long-running collator tasks via slot_based::run(). The wrapper was spawned with spawn_essential_handle().

Essential tasks shut down the node when they complete. The init wrapper completes immediately after spawning, the TaskManager sees an essential task exit, and the node shuts down.

This only affects parachain collators started with --authoring=slot-based.

Fix: use spawn_handle() for the short-lived init wrapper. The child tasks inside slot_based::run() remain correctly marked as essential.

An easy way to reproduce (same setup used by staking-miner nightly test - which in fact started to fail after #11381 got merged e.g. here ): spawn a Zombienet network with a 2-validator relay chain and a single slot-based parachain collator. The collator process starts but shuts down immediately.
For example in your SDK repo:

cd substrate/frame/staking-async/runtimes/papi-tests
just setup
just run fake-dev

which launches zombienet spawning

alice (relay validator, port 9944) — polkadot
bob (relay validator, port 9945) — polkadot
charlie (parachain collator, port 9946) — polkadot-parachain --collator --authoring=slot-based

Port 9946 never comes up.

I have also verified that the fix coming from #11381 still works, running manually ./target/release/polkadot-parachain --chain asset-hub-polkadot --sync warp --authoring=slot-based --tmp -- --sync warp.

Regression from a1a2bbf ("Fix slot-based collator panic during warp sync"). That commit wrapped the slot-based collator launch in an async task that first calls `wait_for_aura`, then spawns the actual long-running collator tasks via `slot_based::run()`. The wrapper was spawned with `spawn_essential_handle()`. Essential tasks shut down the node when they complete — by design, they are expected to run forever. Unlike the lookahead collator (whose `aura::run_with_export().await` loops indefinitely), `slot_based::run()` is synchronous: it spawns two child essential tasks and returns. So the init wrapper completes immediately after spawning, the TaskManager sees an essential task exit, and the node shuts down. This only affects parachain collators started with `--authoring=slot-based` (e.g. the collator on ws port 9946 in a Zombienet setup). Relay chain nodes (ports 9944/9945) use BABE/GRANDPA and are unaffected. Fix: use `spawn_handle()` for the short-lived init wrapper. The child tasks inside `slot_based::run()` remain correctly marked as essential.

sigurpol · 2026-04-03T08:10:16Z

/cmd prdoc --audience runtime_dev --bump patch

…time_dev --bump patch'

sigurpol · 2026-04-03T08:29:30Z

cc @clangenb - PTAL if the change makes sense for you too 🙏

clangenb · 2026-04-03T09:26:30Z

Yoo, sorry, expected the regular non-warp sync case to be tested in CI here, and I did not wait for the para warp sync to finish when I tested. XD

However, it seems there are relevant scenarios not tested in CI - I guess we should add a follow-up issue to that?

EDIT: Fix looks good obviously

sigurpol · 2026-04-03T09:51:13Z

Yoo, sorry, expected the regular non-warp sync case to be tested in CI here, and I did not wait for the para warp sync to finish. XD

However, it seems there are relevant scenarios not tested in CI - I guess we should add a follow-up issue to that?

EDIT: Fix looks good obviously

Thanks for the feedback - yes, I believe we could improve coverage on CI definitely, we were discussing for staking to make tests / setup under staking-async/runtimes/papi-tests part of CI eventually but we haven't prioritized that yet --- so independently by that, I think we should probably have this basic use case where we spawn something similar to what I described as part of CI - maybe @pepoviola as zombienet's wizard - together with node experts - you have suggestions / ideas, I am definitely not authoritative here. I have noticed the issue just because the staking-miner nightly job spawns the setup I described in the PR vs latest SDK and starts to fail miserably 😅

skunert

Thanks! Missed that indeed. We have zombienet tests for authoring, but they use test-parachain binary. It is used because it has extra CLI flags that are needed for some scenarios. But might be better if we switched to Omni node for the ones that don't require anything special.

sigurpol added the T9-cumulus This PR/Issue is related to cumulus. label Apr 3, 2026

github-actions Bot and others added 2 commits April 3, 2026 08:12

Update from github-actions[bot] running command 'prdoc --audience run…

10ea79d

…time_dev --bump patch'

fix prdoc

dd27edb

sigurpol mentioned this pull request Apr 3, 2026

staking-miner integration tests failed against latest polkadot build. paritytech/polkadot-staking-miner#1261

Closed

sigurpol requested review from bkchr, pepoviola and skunert April 3, 2026 08:26

sigurpol requested a review from serban300 April 3, 2026 08:30

fmt

be8812e

skunert approved these changes Apr 3, 2026

View reviewed changes

Ank4n approved these changes Apr 3, 2026

View reviewed changes

acatangiu approved these changes Apr 3, 2026

View reviewed changes

sigurpol added this pull request to the merge queue Apr 3, 2026

Merged via the queue into master with commit 6324a66 Apr 3, 2026
249 of 256 checks passed

sigurpol deleted the sigurpol-fix-regression-a1a2bbfdb4 branch April 3, 2026 14:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: slot-based collator shuts down immediately after init#11628

fix: slot-based collator shuts down immediately after init#11628
sigurpol merged 4 commits intomasterfrom
sigurpol-fix-regression-a1a2bbfdb4

sigurpol commented Apr 3, 2026 •

edited

Loading

Uh oh!

sigurpol commented Apr 3, 2026

Uh oh!

sigurpol commented Apr 3, 2026

Uh oh!

clangenb commented Apr 3, 2026 •

edited

Loading

Uh oh!

sigurpol commented Apr 3, 2026

Uh oh!

skunert left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sigurpol commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sigurpol commented Apr 3, 2026

Uh oh!

sigurpol commented Apr 3, 2026

Uh oh!

clangenb commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sigurpol commented Apr 3, 2026

Uh oh!

skunert left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sigurpol commented Apr 3, 2026 •

edited

Loading

clangenb commented Apr 3, 2026 •

edited

Loading