Skip to content

Cumulus: fix pre-connect to backers for lonely collators#10305

Merged
sandreim merged 7 commits intomasterfrom
sandreim/preconnect_fixes
Nov 14, 2025
Merged

Cumulus: fix pre-connect to backers for lonely collators#10305
sandreim merged 7 commits intomasterfrom
sandreim/preconnect_fixes

Conversation

@sandreim
Copy link
Copy Markdown
Contributor

When running a single collator (most commonly on testnets), the block builder task is always able to claim a slot, so we're never triggering the pre-connect mechanism which happens for slots owned by other authors.
Additionally I fixed some tests.

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@sandreim sandreim moved this from Backlog to Review/Audit in progress in parachains team board Nov 12, 2025
@sandreim
Copy link
Copy Markdown
Contributor Author

/cmd prdoc --audience node_dev --bump patch

@sandreim sandreim added T0-node This PR/Issue is related to the topic “node”. T9-cumulus This PR/Issue is related to cumulus. labels Nov 12, 2025
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@sandreim sandreim requested review from bkchr and skunert November 12, 2025 15:31
@skunert
Copy link
Copy Markdown
Contributor

skunert commented Nov 12, 2025

DQ: Does it make sense to preconnect when running with a single collator? Since we submit something every slot, we should be constantly connected anyway or not?

Comment thread cumulus/client/consensus/aura/src/collators/mod.rs
Comment thread cumulus/client/consensus/aura/src/collators/mod.rs
Comment thread cumulus/client/consensus/aura/src/collators/lookahead.rs Outdated
Comment thread cumulus/client/consensus/aura/src/collators/slot_based/block_builder_task.rs Outdated
@sandreim
Copy link
Copy Markdown
Contributor Author

sandreim commented Nov 13, 2025

DQ: Does it make sense to preconnect when running with a single collator? Since we submit something every slot, we should be constantly connected anyway or not?

We will not be constantly connected as backing groups rotate. Collator protocol keeps track of this and updates connections to new backing groups, but only if we sent a pre-connect message. Roughly without sending this message it works as before: connect when you have a collation.

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Comment thread cumulus/client/consensus/aura/src/collators/lookahead.rs Outdated
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
@sandreim sandreim added this pull request to the merge queue Nov 14, 2025
Merged via the queue into master with commit 31f8f8d Nov 14, 2025
253 of 256 checks passed
@sandreim sandreim deleted the sandreim/preconnect_fixes branch November 14, 2025 16:06
@github-project-automation github-project-automation Bot moved this from Review/Audit in progress to Completed in parachains team board Nov 14, 2025
EgorPopelyaev added a commit to EgorPopelyaev/polkadot-sdk that referenced this pull request Nov 18, 2025
* [CI/CD] Check semver job improvements (paritytech#10323)

This PR adds couple of improvements to the Check semver job for the
stable branches:
1. The `validate: false` option can be set now not only on the `mojor`
bumps but on the `minor` and `patch` as well, this one is useful when
for the backport cases when a desired bump does not match with the one,
that `parity-publish` semver check has predicted (like
[here](https://github.com/paritytech/polkadot-sdk/actions/runs/19135068993/job/54685184577?pr=10221))
2. Possibility to skip check, when it is really not needed but still
fails (like on the post crates release
[prs](https://github.com/paritytech/polkadot-sdk/actions/runs/18311557391/job/52141285274?pr=9951))

closes: paritytech/release-engineering#274

* Cumulus: fix pre-connect to backers for lonely collators (paritytech#10305)

When running a single collator (most commonly on testnets), the block
builder task is always able to claim a slot, so we're never triggering
the pre-connect mechanism which happens for slots owned by other
authors.
Additionally I fixed some tests.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* frame-system: Only enable special benchmarking code when running in `no_std` (paritytech#10321)

This fixes `cargo test -p cumulus-pallet-parachain-system --features
runtime-benchmarks`

* fix: support `paginationStartKey` parameter for `archive_v1_storage` (paritytech#10329)

Fixes paritytech#10185

This PR is to add support for `paginationStartKey` parameter in
`archive_v1_storage` JSON RPC API for query type: `descendantsValues`
and `descendantsHashes` per [the latest
specs](https://paritytech.github.io/json-rpc-interface-spec/api/archive_v1_storage.html).

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bastian Köcher <git@kchr.de>

* Rename `SlotSchedule` to `TargetBlockRate` (paritytech#10316)

This renames the `SlotSchedule` runtime api to `TargetBlockRate`. It
also changes the signature to only returning the target block rate. As
discussed at the retreat, we don't need the block time returned as part
of this runtime api.

* chore: update zombienet environment vars (paritytech#10293)

# Description
paritytech#9724

---------

Co-authored-by: Javier Viola <363911+pepoviola@users.noreply.github.com>

* Skip building on blocks on relay parents in old session (paritytech#9990)

Fixes: paritytech#9977

On our Kusama Canary chain YAP-3392 has the log entry:
```
Collation wasn't advertised because it was built on a relay chain block that is now part of an old session
``` 
[show up 400+ times (2025-10-03 --
2025-10-10)](https://grafana.teleport.parity.io/goto/spoPcDeHR?orgId=1).

# Changes
Changed `offset_relay_parent_find_descendants` to return `None` if the
`relay_best_hash` or any of its ancestors contains an epoch change.

---------

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

* ci: ci-unified with resolc 0.5.0 (paritytech#10325)

cc paritytech/devops#4508
cc @athei

* Introduce `ReplayProofSizeProvider`, `RecordingProofProvider` & transactional extensions (paritytech#9930)

The `ProofSizeExt` extension is used to serve the proof size to the
runtime. It uses the proof recorder to request the current proof size.
The `RecordingProofProvider` extension can record the calls to the proof
size function. Later the `ReplayProofSizeProvider` can be used to replay
these recorded proof sizes. So, the proof recorder is not required
anymore.

Extensions are now also hooked into the transactional system. This means
they are called when a new transaction is created and informed when a
transaction is committed or reverted.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix P256Verify precompile address (paritytech#10336)

fix paritytech/contract-issues#220

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* parachain-consensus: Do not pin blocks on the relay chain during syncing (paritytech#10333)

We had reports in the past about `polkadot-parachain` consuming a lot of
memory during syncing. I spend some time investigating this again.

This graph shows memory consumption during sync process:
<img width="1256" height="302" alt="image"
src="https://github.com/user-attachments/assets/eec1b510-1aa8-446e-8088-5ff0daab6252"
/>

We see a rise up to 50gb and then release of a lot of memory and node
stabilizes at around 20gb. While I still find that relatively high, I
found that the large reduction in memory towards the end was caused by
finality notifications. I tracked down the culprit to be
`parachain-consensus`. It is doing long-blocking finalization operations
and keeps finality notifications around while doing so.

In this PR I introduce a new task that fetches the included block and
then immediately releases the finality notifications of the relay chain.

Memory is now more bounded at around ~12gb:
<img width="1248" height="308" alt="image"
src="https://github.com/user-attachments/assets/5a8be3bb-02a2-400f-9d0d-87ec298ce09f"
/>

closes paritytech#1662

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Allow DT CI to be manually triggered (paritytech#10337)

# Description

This is a small PR that allows for the differential testing job to be
manually triggered instead of _only_ being triggered by PRs.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* [Release|CI/CD] Use larger runners only for the polkadot-parachain and polkadot-omni-node builds (paritytech#10343)

This PR changes the RC build flow so that the large github runners will
be used only for the `polakdot-parachain` and `polkadot-omni-node`
builds, as other binaries builds run fine on the standard runners and
what helps as well to save some costs and resources.

closes: paritytech/release-engineering#279

* Make tasks local only. (paritytech#10162)

Related to paritytech#9693

In the transaction pool, transaction are identified by the tag they
provide.

For tasks the provided tag is simply the hash for the encoded task.

Nothing in the doc says that implementers should be careful that tasks
are not too many for a single operation. What I mean is if a task is
`migrate_keys(limit)`, with valid first from 1 to 10_000. Then all tasks
`migrate_keys(1)`, `migrate_keys(2)` ... `migrate_keys(10_000)` are
valid and effectively do the same operation: they all migrate part of
the keys.
In this case a malicious person can submit all those tasks at once and
spam the transaction pool with 10_000 transactions.

I see multiple solution:
* (1) we are careful when we implement tasks, we make the doc clear, but
the API is error prone. (in my example above we would implement just
`migrate_keys` and inside the call we would do a basic rate of migration
of 1000 keys in a bulk).
* (2) we have a new value returned that is the provided tag for the
task. Or we use the task index as provided tag.
* (3) we only accept local tasks: <-- implemented in this PR.

maybe (2) is a better API if we want external submission of tasks.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Version bumps and prdocs reordering from stable2509-2 (paritytech#10339)

This PR backports regular version bumps and prdoc reordering from the
release branch back to master

* Fix the `CodeNotFound` issue in PolkaVM tests (paritytech#10298)

# Description

This PR bumps the commit hash of the revive-differential-tests framework
to a version that contains a fix for the `CodeNotFound` issue we've been
seeing with PolkaVM. The framework now uploads the code of all the
contracts prior to running the tests.

When CI runs for this PR we should observe that there's either no more
`CodeNotFound` errors in PolkaVM tests or that it's greatly reduced.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Don't require PR for uploading comment for DT CI (paritytech#10347)

# Description

Small PR that changes the DT CI to not require a PR for uploading the
report to the CI job.

---------

Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* add flow to create an old tag

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Thang X. Vu <zthangxv@gmail.com>
Co-authored-by: DenzelPenzel <15388928+DenzelPenzel@users.noreply.github.com>
Co-authored-by: Javier Viola <363911+pepoviola@users.noreply.github.com>
Co-authored-by: Alexander Cyon <Sajjon@users.noreply.github.com>
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Co-authored-by: Alexander Samusev <41779041+alvicsam@users.noreply.github.com>
Co-authored-by: PG Herveou <pgherveou@gmail.com>
Co-authored-by: Omar <OmarAbdulla7@hotmail.com>
Co-authored-by: Guillaume Thiolliere <gui.thiolliere@gmail.com>
Co-authored-by: BDevParity <bruno.devic@parity.io>
0xRVE pushed a commit that referenced this pull request Nov 18, 2025
When running a single collator (most commonly on testnets), the block
builder task is always able to claim a slot, so we're never triggering
the pre-connect mechanism which happens for slots owned by other
authors.
Additionally I fixed some tests.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
RomarQ pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this pull request Dec 3, 2025
…10305)

When running a single collator (most commonly on testnets), the block
builder task is always able to claim a slot, so we're never triggering
the pre-connect mechanism which happens for slots owned by other
authors.
Additionally I fixed some tests.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T0-node This PR/Issue is related to the topic “node”. T9-cumulus This PR/Issue is related to cumulus.

Projects

Status: Completed

Development

Successfully merging this pull request may close these issues.

4 participants