fix(walletconnect): centralize connection and retry logic#2508
fix(walletconnect): centralize connection and retry logic#2508
Conversation
This commit refactors the WalletConnect module to use a centralized state machine for managing the WebSocket connection, resolving a critical race condition and simplifying connection logic. Previously, scattered retry loops in each network-facing function attempted to handle disconnections, leading to duplicated code and potential race conditions. The new architecture introduces: 1. A single connection_lifecycle_task that owns the entire connection lifecycle, including initial connection, monitoring, and reconnection with exponential backoff. 2. A `tokio::sync::watch` channel to broadcast the connection status (Connecting, Connected, Disconnected) to all parts of the application. 3. A unified `await_connection` helper that acts as a gatekeeper, ensuring operations only proceed when the connection is active. This change removes the standalone `handle_disconnections` function and all per-RPC retry logic, making the system more robust and easier to maintain.
NotConnected race condition by centralizing connection state
smk762
left a comment
There was a problem hiding this comment.
LGTM!
I had a temporary error like
{
"mmrpc": "2.0",
"error": "Client Error: failed to connect",
"error_path": "new_connection.lib",
"error_trace": "new_connection:29] lib:236]",
"error_type": "SessionRequestError",
"error_data": "Client Error: failed to connect",
"id": null
}before getting the expected responses a couple of minutes later. I believe this was a local issue, and the fix is working as intended, so ✅
|
Although I am now able to create a connection, I'm not sure I am able to use it correctly. A few attempts to connect with scanning QR code with Trust wallet failed. I did have a successful attempt with ATOM on Keplr though, so might be just a Trust wallet issue. I'll test a few more. |
…actors or cleanups to follow in next commit
The `await_connection` function has been rewritten to loop until a `Connected` state is reached, preventing premature failures. This change re-introduces the `Connecting` state, which is now properly handled by the new waiting logic.
smk762
left a comment
There was a problem hiding this comment.
Confirmed functioning for both EVM and Tendermint, via keplr and metamask wallets.
Had some problems trying to connect with Trust Wallet and Bifrost (which appears to require tesstnet / dev mode).
Tested all wc_methods on legacy/hd/wasm/native and related activations with priv_key_policy using the new object structure
"priv_key_policy": {
"type": "WalletConnect",
"params": {
"session_topic": "45e250e14bec044dde7b007165608952032421387e6ac36e05654cd1fd175a99"
}
}Some secondary methods to check balance / tx_history were also confirmed to be reporting as expected.
mariocynicys
left a comment
There was a problem hiding this comment.
Thanks for the fix.
LGTM!
* dev: fix(eth): Correctly implement ETH max withdrawal logic (#2531) feat(use-clap-for-cli): use clap to parse CLI-Args #2215 (#2510) feat(orderbook): expirable maker orders (#2516) improvement(eth): drop parity support (#2527) chore(release): finalize changelog for v2.5.0-beta (#2524) chore(toolchain): upgrade toolchain to nightly 1.86.0 (#2444) feat(swap): rpc to find best swap with liquidity routing for ask (#2362) fix(kdf_walletconnect): apply explicit MmError mapping (#2514) fix(walletconnect): centralize connection and retry logic (#2508) fix(hw-wallet): avoid calling `get_enabled_address` in trezor-based coin init (#2504)
* refactor-gas-fee-policy: (30 commits) revert change in test_enable_custom_erc20 fix(test) disable get_swap_gas_fee_policy for tests (no platform coin in tests) revert tokio test and use block_on for eth docker test Refactor gas fee code: get max_eth_tx_type from platform coin, add gas fee adjust coin params, rename "set_swap_transaction_fee_policy" rpc to "set_swap_gas_fee_policy", force set_swap_gas_fee_policy to store policy in the platform coin fix(eth): Propagate structured EIP-1559 fee errors (GLEECBTC#2532) fix(eth): Correctly implement ETH max withdrawal logic (GLEECBTC#2531) feat(use-clap-for-cli): use clap to parse CLI-Args GLEECBTC#2215 (GLEECBTC#2510) feat(orderbook): expirable maker orders (GLEECBTC#2516) improvement(eth): drop parity support (GLEECBTC#2527) chore(release): finalize changelog for v2.5.0-beta (GLEECBTC#2524) chore(toolchain): upgrade toolchain to nightly 1.86.0 (GLEECBTC#2444) feat(swap): rpc to find best swap with liquidity routing for ask (GLEECBTC#2362) fix(kdf_walletconnect): apply explicit MmError mapping (GLEECBTC#2514) fix(walletconnect): centralize connection and retry logic (GLEECBTC#2508) fix(hw-wallet): avoid calling `get_enabled_address` in trezor-based coin init (GLEECBTC#2504) chore(core): replace hash_raw_entry with stable entry() API (GLEECBTC#2473) chore(core): adapt `MmError` and usages for compatibility with new rustc versions (GLEECBTC#2443) feat(wallet): add `delete_wallet` RPC (GLEECBTC#2497) chore(release): add changelog entries for v2.5.0-beta (GLEECBTC#2494) chore(release): bump kdf version to 2.5.0-beta (GLEECBTC#2492) ... # Conflicts: # mm2src/coins/eth.rs # mm2src/coins/eth/eth_swap_v2/eth_maker_swap_v2.rs # mm2src/coins/eth/eth_swap_v2/eth_taker_swap_v2.rs # mm2src/coins/eth/eth_utils.rs # mm2src/coins/eth/eth_withdraw.rs # mm2src/coins/eth/v2_activation.rs # mm2src/coins/lp_coins.rs # mm2src/coins/qrc20.rs # mm2src/coins/qrc20/swap.rs # mm2src/coins_activation/src/platform_coin_with_tokens.rs # mm2src/coins_activation/src/token.rs # mm2src/mm2_main/src/lp_swap/check_balance.rs # mm2src/mm2_main/src/lp_swap/maker_swap.rs # mm2src/mm2_main/src/rpc/dispatcher/dispatcher.rs # mm2src/mm2_main/src/rpc/lp_commands/ext_api/ext_api_types.rs # mm2src/mm2_main/src/rpc/lp_commands/mod.rs # mm2src/mm2_main/src/rpc/lp_commands/one_inch/errors.rs # mm2src/mm2_main/src/rpc/lp_commands/one_inch/rpcs.rs # mm2src/mm2_main/src/rpc/lp_commands/tokens.rs # mm2src/mm2_main/tests/docker_tests/eth_docker_tests.rs # mm2src/trading_api/src/one_inch_api/classic_swap_types.rs # mm2src/trading_api/src/one_inch_api/client.rs # mm2src/trading_api/src/one_inch_api/portfolio_types.rs
This PR fixes a critical race condition in the WalletConnect module that causes
Client(WebsocketClient(NotConnected))errors during initialization, particularly for thewc_new_connectionRPC call.The Problem
As reported in #2505, network operations could be attempted before the WebSocket connection was fully established or immediately after it was dropped. This resulted in an immediate failure with the following error, instead of returning a valid session URL:
{ "error": "Subscription Error: Client(WebsocketClient(NotConnected))", "error_type": "SessionRequestError" }This was a regression traced back to recent changes in #2485 that decoupled WalletConnect initialization from KDF initialization.
The Solution
This PR resolves the issue by refactoring the connection management logic into a robust, centralized state machine. The scattered, per-function retry loops have been removed in favor of a more reliable architecture.
The key changes are:
Centralized
connection_lifecycle_task: A single background task now owns the entire connection lifecycle. It is the single source of truth for the connection status and handles initial connection, monitoring for disconnects, and reconnection with exponential backoff.State Broadcast via watch Channel: The lifecycle task broadcasts the current connection status (Connecting, Connected, Disconnected) to the rest of the application using a
tokio::sync::watchchannel.await_connectionGatekeeper: All network-dependent functions (likepublish_payload,new_connection, etc.) now first call theawait_connectionhelper. This function acts as a gatekeeper, pausing execution until it receives a Connected signal from the lifecycle task.This new architecture completely eliminates the race condition by ensuring that no network operation can be attempted unless the connection is confirmed to be active. This makes the connection handling more resilient and easier to maintain.
Fixes #2505