Skip to content

[stable2506] Backport #8839#9072

Merged
EgorPopelyaev merged 7 commits intostable2506from
backport-8839-to-stable2506
Jul 4, 2025
Merged

[stable2506] Backport #8839#9072
EgorPopelyaev merged 7 commits intostable2506from
backport-8839-to-stable2506

Conversation

@paritytech-release-backport-bot

Backport #8839 into stable2506 from Sajjon.

See the documentation on how to use this bot.

Implementation of #8758

# Description
Authority Discovery crate has been changed so that the `AddrCache` is
persisted to `persisted_cache_file_path` a `json` file in
`net_config_path` folder controlled by `NetworkConfiguration`.

`AddrCache` is JSON serialized (`serde_json::to_pretty_string`) and
persisted to file:
- periodically (every 10 minutes)
- on shutdown

Furthermore, this persisted `AddrCache` on file will be read from upon
start of the worker - if it does not exist, or we failed to deserialize
it a new empty cache is used.

`AddrCache` is made Serialize/Deserialize thanks to `PeerId` and
`Multiaddr` being made Serialize/Deserialize.

# Implementation
The worker use a spawner which is used in the [run loop of the worker,
where at an interval we try to persist the
[AddrCache](https://github.com/paritytech/polkadot-sdk/blob/cyon/persist_peers_cache/substrate/client/authority-discovery/src/worker.rs#L361-L372).
We won't persist the `AddrCache` if `persisted_cache_file_path:
Option<PathBuf>` is `None` - which it would be if
[`NetworkConfiguration`
`net_config_path`](https://github.com/paritytech/polkadot-sdk/blob/master/substrate/client/network/src/config.rs#L591)
is `None`. We spawn a new task each time the `interval` "ticks" - once
every 10 minutes - and it uses `fs::write` (there is also a
`tokio::fs::write` which requires the `fs` feature flag of `tokio` which
is not activated and I chose to not use it). If the worker shutsdown we
will try to persist without using the `spawner`.

# Changes
- New crate dependency: `serde_with` for `SerializeDisplay` and
`DeserialzeFromStr` macros
- `WorkerConfig` in authority-discovery crate has a new field
`persisted_cache_directory : Option<PathBuf>`
- `Worker` in authority-discovery crate constructor now takes a new
parameter, `spawner: Arc<dyn SpawnNamed>`

## Tests
- [authority-discovery
tests](substrate/client/authority-discovery/src/tests.rs) tests are
changed to use tokio runtime, `#[tokio::test]` and we pass a test worker
config with a `tempdir` for `persisted_cache_directory `

# `net_config_path`
Here are the `net_config_path` (from `NetworkConfiguration`) which is
the folder used by this PR to save a serialized `AddrCache` in:

## `dev`
```sh
cargo build --release && ./target/release/polkadot --dev
```

shows =>

`/var/folders/63/fs7x_3h16svftdz4g9bjk13h0000gn/T/substratey5QShJ/chains/rococo_dev/network/authority_discovery_addr_cache.json'`

## `kusama`
```sh
cargo build --release && ./target/release/polkadot --chain kusama --validator
```

shows => `~/Library/Application
Support/polkadot/chains/ksmcc3/network/authority_discovery_addr_cache.json`

> [!CAUTION]
> The node shutdown automatically with scary error.
> ```
> Essential task `overseer` failed. Shutting down service.
> TCP listener terminated with error error=Custom { kind: Other, error:
"A Tokio 1.x context was found, but it is being shutdown." }
> Installed transports terminated, ignore if the node is stopping
> Litep2p backend terminated`
>Error:
>   0: Other: Essential task failed.
> ```
> This is maybe expected/correct, but just wanted to flag it, expand
`output` below to see log
>
> Or did I break anything?

<details><summary>Full Log with scary error (expand me 👈)</summary>
The log

```sh
$ ./target/release/polkadot --chain kusama --validator
2025-06-19 14:34:35 ----------------------------
2025-06-19 14:34:35 This chain is not in any way
2025-06-19 14:34:35       endorsed by the
2025-06-19 14:34:35      KUSAMA FOUNDATION
2025-06-19 14:34:35 ----------------------------
2025-06-19 14:34:35 Parity Polkadot
2025-06-19 14:34:35 ✌️  version 1.18.5-e6b86b54d31
2025-06-19 14:34:35 ❤️  by Parity Technologies <admin@parity.io>, 2017-2025
2025-06-19 14:34:35 📋 Chain specification: Kusama
2025-06-19 14:34:35 🏷  Node name: glamorous-game-6626
2025-06-19 14:34:35 👤 Role: AUTHORITY
2025-06-19 14:34:35 💾 Database: RocksDb at /Users/alexandercyon/Library/Application Support/polkadot/chains/ksmcc3/db/full
2025-06-19 14:34:39 Creating transaction pool txpool_type=SingleState ready=Limit { count: 8192, total_bytes: 20971520 } future=Limit { count: 819, total_bytes: 2097152 }
2025-06-19 14:34:39 🚀 Using prepare-worker binary at: "/Users/alexandercyon/Developer/Rust/polkadot-sdk/target/release/polkadot-prepare-worker"
2025-06-19 14:34:39 🚀 Using execute-worker binary at: "/Users/alexandercyon/Developer/Rust/polkadot-sdk/target/release/polkadot-execute-worker"
2025-06-19 14:34:39 Local node identity is: 12D3KooWPVh77R44wZwySBys262Jh4BSbpMFxtvQNmi1EpdcwDDW
2025-06-19 14:34:39 Running litep2p network backend
2025-06-19 14:34:40 💻 Operating system: macos
2025-06-19 14:34:40 💻 CPU architecture: aarch64
2025-06-19 14:34:40 📦 Highest known block at #1294645
2025-06-19 14:34:40 〽️ Prometheus exporter started at 127.0.0.1:9615
2025-06-19 14:34:40 Running JSON-RPC server: addr=127.0.0.1:9944,[::1]:9944
2025-06-19 14:34:40 🏁 CPU single core score: 1.35 GiBs, parallelism score: 1.44 GiBs with expected cores: 8
2025-06-19 14:34:40 🏁 Memory score: 63.75 GiBs
2025-06-19 14:34:40 🏁 Disk score (seq. writes): 2.92 GiBs
2025-06-19 14:34:40 🏁 Disk score (rand. writes): 727.56 MiBs
2025-06-19 14:34:40 CYON: 🔮 Good, path set to: /Users/alexandercyon/Library/Application Support/polkadot/chains/ksmcc3/network/authority_discovery_addr_cache.json
2025-06-19 14:34:40 🚨 Your system cannot securely run a validator.
Running validation of malicious PVF code has a higher risk of compromising this machine.
Secure mode is enabled only for Linux
and a full secure mode is enabled only for Linux x86-64.
You can ignore this error with the `--insecure-validator-i-know-what-i-do` command line argument if you understand and accept the risks of running insecurely. With this flag, security features are enabled on a best-effort basis, but not mandatory.
More information: https://docs.polkadot.com/infrastructure/running-a-validator/operational-tasks/general-management/#secure-your-validator
2025-06-19 14:34:40 Successfully persisted AddrCache on disk
2025-06-19 14:34:40 subsystem exited with error subsystem="candidate-validation" err=FromOrigin { origin: "candidate-validation", source: Context("could not enable Secure Validator Mode for non-Linux; check logs") }
2025-06-19 14:34:40 Starting workers
2025-06-19 14:34:40 Starting approval distribution workers
2025-06-19 14:34:40 👶 Starting BABE Authorship worker
2025-06-19 14:34:40 Starting approval voting workers
2025-06-19 14:34:40 Starting main subsystem loop
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="candidate-validation"
2025-06-19 14:34:40 Starting with an empty approval vote DB.
2025-06-19 14:34:40 subsystem finished unexpectedly subsystem=Ok(())
2025-06-19 14:34:40 🥩 BEEFY gadget waiting for BEEFY pallet to become available...
2025-06-19 14:34:40 Received `Conclude` signal, exiting
2025-06-19 14:34:40 Conclude
2025-06-19 14:34:40 received `Conclude` signal, exiting
2025-06-19 14:34:40 received `Conclude` signal, exiting
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="availability-recovery"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="bitfield-distribution"
2025-06-19 14:34:40 Approval distribution worker 3, exiting because of shutdown
2025-06-19 14:34:40 Approval distribution worker 2, exiting because of shutdown
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="dispute-distribution"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="chain-selection"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="pvf-checker"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="availability-store"
2025-06-19 14:34:40 Approval distribution worker 1, exiting because of shutdown
2025-06-19 14:34:40 Approval distribution worker 0, exiting because of shutdown
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="approval-voting"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="approval-distribution"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="chain-api"
2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down
2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down
2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down
2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="provisioner"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="availability-distribution"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="runtime-api"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="candidate-backing"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="collation-generation"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="gossip-support"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="approval-voting-parallel"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="bitfield-signing"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="collator-protocol"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="statement-distribution"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="network-bridge-tx"
2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="network-bridge-rx"
2025-06-19 14:34:41 subsystem exited with error subsystem="prospective-parachains" err=FromOrigin { origin: "prospective-parachains", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
2025-06-19 14:34:41 subsystem exited with error subsystem="dispute-coordinator" err=FromOrigin { origin: "dispute-coordinator", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
2025-06-19 14:34:41 Essential task `overseer` failed. Shutting down service.
2025-06-19 14:34:41 TCP listener terminated with error error=Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." }
2025-06-19 14:34:41 Installed transports terminated, ignore if the node is stopping
2025-06-19 14:34:41 Litep2p backend terminated
Error:
   0: Other: Essential task failed.

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
```

🤔

</details>

## `kusama -d /my/custom/path`
```sh
cargo build --release && ./target/release/polkadot --chain kusama --validator --unsafe-force-node-key-generation -d /my/custom/path
```
shows => `./my/custom/path/chains/ksmcc3/network/` for `net_config_path`

## `test`

I've configured a `WorkerConfig` with a `tempfile` for all tests. To my
surprise I had to call `fs::create_dir_all` in order for the tempdir to
actually be created.

---------

Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: alvicsam <alvicsam@gmail.com>
(cherry picked from commit ee6d22b)
@github-actions github-actions bot added the A3-backport Pull request is already reviewed well in another branch. label Jul 2, 2025
@github-actions github-actions bot requested a review from Sajjon July 2, 2025 13:14
@github-actions
Copy link
Contributor

github-actions bot commented Jul 2, 2025

This pull request is amending an existing release. Please proceed with extreme caution,
as to not impact downstream teams that rely on the stability of it. Some things to consider:

  • Backports are only for 'patch' or 'minor' changes. No 'major' or other breaking change.
  • Should be a legit fix for some bug, not adding tons of new features.
  • Must either be already audited or not need an audit.
Emergency Bypass

If you really need to bypass this check: add validate: false to each crate
in the Prdoc where a breaking change is introduced. This will release a new major
version of that crate and all its reverse dependencies and basically break the release.

@EgorPopelyaev
Copy link
Contributor

@Sajjon Same here: the major bumps are not allowed in the backports to the existing stable branches, could it be re-done so that there won't be a breaking change?

@sandreim
Copy link
Contributor

sandreim commented Jul 3, 2025

@Sajjon Same here: the major bumps are not allowed in the backports to the existing stable branches, could it be re-done so that there won't be a breaking change?

Is 2506 already released ?

@EgorPopelyaev
Copy link
Contributor

Not yet, but we still wanted to keep the amount of breaking changes merged back as low as possible even for the upcoming release

@Sajjon
Copy link
Contributor

Sajjon commented Jul 3, 2025

@EgorPopelyaev btw the addition of the dependency sc-service is not needed https://github.com/paritytech/polkadot-sdk/pull/9072/files#diff-3b41c048455e068250da8641e6a33a883fa6e5cb9bce7d89270409627ba6b8d7R33

I just tested removing it and tests pass. So should I commit to this PR removing it ?

@EgorPopelyaev
Copy link
Contributor

@Sajjon yep, whatever works best for you :)

@Sajjon
Copy link
Contributor

Sajjon commented Jul 3, 2025

@Sajjon yep, whatever works best for you :)

pushed 93672f7

@Sajjon
Copy link
Contributor

Sajjon commented Jul 3, 2025

@EgorPopelyaev CI check is failing

error: failed to select a version for `assets-common`.
    ... required by package `asset-hub-rococo-runtime v0.25.2 (/__w/polkadot-sdk/polkadot-sdk/cumulus/parachains/runtimes/assets/asset-hub-rococo)`
versions that meet the requirements `^0.21.0` are: 0.21.0

the package `asset-hub-rococo-runtime` depends on `assets-common`, with features: `try-runtime` but `assets-common` does not have these features.


failed to select a version for `assets-common` which could resolve this conflict

What do you think about removing this line:
https://github.com/paritytech/polkadot-sdk/blob/backport-8839-to-stable2506/cumulus/parachains/runtimes/assets/asset-hub-rococo/Cargo.toml#L153

I think that will fix it? it was added 5 weeks ago here I think.

@EgorPopelyaev
Copy link
Contributor

@Sajjon This CI check I guess needs a bit deeper look into it, if this deletion fixes the current error, it most likely will fail with another one. So I would leave it for a separate PR

@EgorPopelyaev EgorPopelyaev enabled auto-merge (squash) July 4, 2025 14:08
@EgorPopelyaev EgorPopelyaev merged commit 90693fd into stable2506 Jul 4, 2025
239 of 242 checks passed
@EgorPopelyaev EgorPopelyaev deleted the backport-8839-to-stable2506 branch July 4, 2025 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A3-backport Pull request is already reviewed well in another branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants