Conversation
Implementation of #8758 # Description Authority Discovery crate has been changed so that the `AddrCache` is persisted to `persisted_cache_file_path` a `json` file in `net_config_path` folder controlled by `NetworkConfiguration`. `AddrCache` is JSON serialized (`serde_json::to_pretty_string`) and persisted to file: - periodically (every 10 minutes) - on shutdown Furthermore, this persisted `AddrCache` on file will be read from upon start of the worker - if it does not exist, or we failed to deserialize it a new empty cache is used. `AddrCache` is made Serialize/Deserialize thanks to `PeerId` and `Multiaddr` being made Serialize/Deserialize. # Implementation The worker use a spawner which is used in the [run loop of the worker, where at an interval we try to persist the [AddrCache](https://github.com/paritytech/polkadot-sdk/blob/cyon/persist_peers_cache/substrate/client/authority-discovery/src/worker.rs#L361-L372). We won't persist the `AddrCache` if `persisted_cache_file_path: Option<PathBuf>` is `None` - which it would be if [`NetworkConfiguration` `net_config_path`](https://github.com/paritytech/polkadot-sdk/blob/master/substrate/client/network/src/config.rs#L591) is `None`. We spawn a new task each time the `interval` "ticks" - once every 10 minutes - and it uses `fs::write` (there is also a `tokio::fs::write` which requires the `fs` feature flag of `tokio` which is not activated and I chose to not use it). If the worker shutsdown we will try to persist without using the `spawner`. # Changes - New crate dependency: `serde_with` for `SerializeDisplay` and `DeserialzeFromStr` macros - `WorkerConfig` in authority-discovery crate has a new field `persisted_cache_directory : Option<PathBuf>` - `Worker` in authority-discovery crate constructor now takes a new parameter, `spawner: Arc<dyn SpawnNamed>` ## Tests - [authority-discovery tests](substrate/client/authority-discovery/src/tests.rs) tests are changed to use tokio runtime, `#[tokio::test]` and we pass a test worker config with a `tempdir` for `persisted_cache_directory ` # `net_config_path` Here are the `net_config_path` (from `NetworkConfiguration`) which is the folder used by this PR to save a serialized `AddrCache` in: ## `dev` ```sh cargo build --release && ./target/release/polkadot --dev ``` shows => `/var/folders/63/fs7x_3h16svftdz4g9bjk13h0000gn/T/substratey5QShJ/chains/rococo_dev/network/authority_discovery_addr_cache.json'` ## `kusama` ```sh cargo build --release && ./target/release/polkadot --chain kusama --validator ``` shows => `~/Library/Application Support/polkadot/chains/ksmcc3/network/authority_discovery_addr_cache.json` > [!CAUTION] > The node shutdown automatically with scary error. > ``` > Essential task `overseer` failed. Shutting down service. > TCP listener terminated with error error=Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } > Installed transports terminated, ignore if the node is stopping > Litep2p backend terminated` >Error: > 0: Other: Essential task failed. > ``` > This is maybe expected/correct, but just wanted to flag it, expand `output` below to see log > > Or did I break anything? <details><summary>Full Log with scary error (expand me 👈)</summary> The log ```sh $ ./target/release/polkadot --chain kusama --validator 2025-06-19 14:34:35 ---------------------------- 2025-06-19 14:34:35 This chain is not in any way 2025-06-19 14:34:35 endorsed by the 2025-06-19 14:34:35 KUSAMA FOUNDATION 2025-06-19 14:34:35 ---------------------------- 2025-06-19 14:34:35 Parity Polkadot 2025-06-19 14:34:35 ✌️ version 1.18.5-e6b86b54d31 2025-06-19 14:34:35 ❤️ by Parity Technologies <admin@parity.io>, 2017-2025 2025-06-19 14:34:35 📋 Chain specification: Kusama 2025-06-19 14:34:35 🏷 Node name: glamorous-game-6626 2025-06-19 14:34:35 👤 Role: AUTHORITY 2025-06-19 14:34:35 💾 Database: RocksDb at /Users/alexandercyon/Library/Application Support/polkadot/chains/ksmcc3/db/full 2025-06-19 14:34:39 Creating transaction pool txpool_type=SingleState ready=Limit { count: 8192, total_bytes: 20971520 } future=Limit { count: 819, total_bytes: 2097152 } 2025-06-19 14:34:39 🚀 Using prepare-worker binary at: "/Users/alexandercyon/Developer/Rust/polkadot-sdk/target/release/polkadot-prepare-worker" 2025-06-19 14:34:39 🚀 Using execute-worker binary at: "/Users/alexandercyon/Developer/Rust/polkadot-sdk/target/release/polkadot-execute-worker" 2025-06-19 14:34:39 Local node identity is: 12D3KooWPVh77R44wZwySBys262Jh4BSbpMFxtvQNmi1EpdcwDDW 2025-06-19 14:34:39 Running litep2p network backend 2025-06-19 14:34:40 💻 Operating system: macos 2025-06-19 14:34:40 💻 CPU architecture: aarch64 2025-06-19 14:34:40 📦 Highest known block at #1294645 2025-06-19 14:34:40 〽️ Prometheus exporter started at 127.0.0.1:9615 2025-06-19 14:34:40 Running JSON-RPC server: addr=127.0.0.1:9944,[::1]:9944 2025-06-19 14:34:40 🏁 CPU single core score: 1.35 GiBs, parallelism score: 1.44 GiBs with expected cores: 8 2025-06-19 14:34:40 🏁 Memory score: 63.75 GiBs 2025-06-19 14:34:40 🏁 Disk score (seq. writes): 2.92 GiBs 2025-06-19 14:34:40 🏁 Disk score (rand. writes): 727.56 MiBs 2025-06-19 14:34:40 CYON: 🔮 Good, path set to: /Users/alexandercyon/Library/Application Support/polkadot/chains/ksmcc3/network/authority_discovery_addr_cache.json 2025-06-19 14:34:40 🚨 Your system cannot securely run a validator. Running validation of malicious PVF code has a higher risk of compromising this machine. Secure mode is enabled only for Linux and a full secure mode is enabled only for Linux x86-64. You can ignore this error with the `--insecure-validator-i-know-what-i-do` command line argument if you understand and accept the risks of running insecurely. With this flag, security features are enabled on a best-effort basis, but not mandatory. More information: https://docs.polkadot.com/infrastructure/running-a-validator/operational-tasks/general-management/#secure-your-validator 2025-06-19 14:34:40 Successfully persisted AddrCache on disk 2025-06-19 14:34:40 subsystem exited with error subsystem="candidate-validation" err=FromOrigin { origin: "candidate-validation", source: Context("could not enable Secure Validator Mode for non-Linux; check logs") } 2025-06-19 14:34:40 Starting workers 2025-06-19 14:34:40 Starting approval distribution workers 2025-06-19 14:34:40 👶 Starting BABE Authorship worker 2025-06-19 14:34:40 Starting approval voting workers 2025-06-19 14:34:40 Starting main subsystem loop 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="candidate-validation" 2025-06-19 14:34:40 Starting with an empty approval vote DB. 2025-06-19 14:34:40 subsystem finished unexpectedly subsystem=Ok(()) 2025-06-19 14:34:40 🥩 BEEFY gadget waiting for BEEFY pallet to become available... 2025-06-19 14:34:40 Received `Conclude` signal, exiting 2025-06-19 14:34:40 Conclude 2025-06-19 14:34:40 received `Conclude` signal, exiting 2025-06-19 14:34:40 received `Conclude` signal, exiting 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="availability-recovery" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="bitfield-distribution" 2025-06-19 14:34:40 Approval distribution worker 3, exiting because of shutdown 2025-06-19 14:34:40 Approval distribution worker 2, exiting because of shutdown 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="dispute-distribution" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="chain-selection" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="pvf-checker" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="availability-store" 2025-06-19 14:34:40 Approval distribution worker 1, exiting because of shutdown 2025-06-19 14:34:40 Approval distribution worker 0, exiting because of shutdown 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="approval-voting" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="approval-distribution" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="chain-api" 2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down 2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down 2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down 2025-06-19 14:34:40 Approval distribution stream finished, most likely shutting down 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="provisioner" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="availability-distribution" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="runtime-api" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="candidate-backing" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="collation-generation" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="gossip-support" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="approval-voting-parallel" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="bitfield-signing" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="collator-protocol" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="statement-distribution" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="network-bridge-tx" 2025-06-19 14:34:40 Terminating due to subsystem exit subsystem="network-bridge-rx" 2025-06-19 14:34:41 subsystem exited with error subsystem="prospective-parachains" err=FromOrigin { origin: "prospective-parachains", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) } 2025-06-19 14:34:41 subsystem exited with error subsystem="dispute-coordinator" err=FromOrigin { origin: "dispute-coordinator", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) } 2025-06-19 14:34:41 Essential task `overseer` failed. Shutting down service. 2025-06-19 14:34:41 TCP listener terminated with error error=Custom { kind: Other, error: "A Tokio 1.x context was found, but it is being shutdown." } 2025-06-19 14:34:41 Installed transports terminated, ignore if the node is stopping 2025-06-19 14:34:41 Litep2p backend terminated Error: 0: Other: Essential task failed. Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it. Run with RUST_BACKTRACE=full to include source snippets. ``` 🤔 </details> ## `kusama -d /my/custom/path` ```sh cargo build --release && ./target/release/polkadot --chain kusama --validator --unsafe-force-node-key-generation -d /my/custom/path ``` shows => `./my/custom/path/chains/ksmcc3/network/` for `net_config_path` ## `test` I've configured a `WorkerConfig` with a `tempfile` for all tests. To my surprise I had to call `fs::create_dir_all` in order for the tempdir to actually be created. --------- Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: alvicsam <alvicsam@gmail.com> (cherry picked from commit ee6d22b)
|
This pull request is amending an existing release. Please proceed with extreme caution,
Emergency Bypass
If you really need to bypass this check: add |
|
@Sajjon Same here: the major bumps are not allowed in the backports to the existing stable branches, could it be re-done so that there won't be a breaking change? |
Is 2506 already released ? |
|
Not yet, but we still wanted to keep the amount of breaking changes merged back as low as possible even for the upcoming release |
|
@EgorPopelyaev btw the addition of the dependency I just tested removing it and tests pass. So should I commit to this PR removing it ? |
|
@Sajjon yep, whatever works best for you :) |
|
@EgorPopelyaev CI check is failing What do you think about removing this line: I think that will fix it? it was added 5 weeks ago here I think. |
|
@Sajjon This CI check I guess needs a bit deeper look into it, if this deletion fixes the current error, it most likely will fail with another one. So I would leave it for a separate PR |
Backport #8839 into
stable2506from Sajjon.See the documentation on how to use this bot.