Skip to content

fix(walletconnect): centralize connection and retry logic#2508

Merged
shamardy merged 6 commits intostagingfrom
fix/wc-connection-retry
Jun 30, 2025
Merged

fix(walletconnect): centralize connection and retry logic#2508
shamardy merged 6 commits intostagingfrom
fix/wc-connection-retry

Conversation

@shamardy
Copy link
Copy Markdown
Collaborator

This PR fixes a critical race condition in the WalletConnect module that causes Client(WebsocketClient(NotConnected)) errors during initialization, particularly for the wc_new_connection RPC call.

The Problem

As reported in #2505, network operations could be attempted before the WebSocket connection was fully established or immediately after it was dropped. This resulted in an immediate failure with the following error, instead of returning a valid session URL:

{
    "error": "Subscription Error: Client(WebsocketClient(NotConnected))",
    "error_type": "SessionRequestError"
}

This was a regression traced back to recent changes in #2485 that decoupled WalletConnect initialization from KDF initialization.

The Solution

This PR resolves the issue by refactoring the connection management logic into a robust, centralized state machine. The scattered, per-function retry loops have been removed in favor of a more reliable architecture.

The key changes are:

  1. Centralized connection_lifecycle_task: A single background task now owns the entire connection lifecycle. It is the single source of truth for the connection status and handles initial connection, monitoring for disconnects, and reconnection with exponential backoff.

  2. State Broadcast via watch Channel: The lifecycle task broadcasts the current connection status (Connecting, Connected, Disconnected) to the rest of the application using a tokio::sync::watch channel.

  3. await_connection Gatekeeper: All network-dependent functions (like publish_payload, new_connection, etc.) now first call the await_connection helper. This function acts as a gatekeeper, pausing execution until it receives a Connected signal from the lifecycle task.

This new architecture completely eliminates the race condition by ensuring that no network operation can be attempted unless the connection is confirmed to be active. This makes the connection handling more resilient and easier to maintain.

Fixes #2505

This commit refactors the WalletConnect module to use a centralized state machine for managing the WebSocket connection, resolving a critical race condition and simplifying connection logic.
Previously, scattered retry loops in each network-facing function attempted to handle disconnections, leading to duplicated code and potential race conditions.
The new architecture introduces:
1. A single connection_lifecycle_task that owns the entire connection lifecycle, including initial connection, monitoring, and reconnection with exponential backoff.
2. A `tokio::sync::watch` channel to broadcast the connection status (Connecting, Connected, Disconnected) to all parts of the application.
3. A unified `await_connection` helper that acts as a gatekeeper, ensuring operations only proceed when the connection is active.
This change removes the standalone `handle_disconnections` function and all per-RPC retry logic, making the system more robust and easier to maintain.
@shamardy shamardy changed the title fix(walletconnect): Resolve NotConnected race condition by centralizing connection state fix(walletconnect): centralize connection and retry logic Jun 27, 2025
@shamardy shamardy added priority: urgent Critical tasks requiring immediate action. 2.5.0-beta status: pending review and removed status: in progress labels Jun 27, 2025
@smk762 smk762 self-requested a review June 27, 2025 09:54
smk762
smk762 previously approved these changes Jun 27, 2025
Copy link
Copy Markdown

@smk762 smk762 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I had a temporary error like

{
    "mmrpc": "2.0",
    "error": "Client Error: failed to connect",
    "error_path": "new_connection.lib",
    "error_trace": "new_connection:29] lib:236]",
    "error_type": "SessionRequestError",
    "error_data": "Client Error: failed to connect",
    "id": null
}

before getting the expected responses a couple of minutes later. I believe this was a local issue, and the fix is working as intended, so ✅

@smk762
Copy link
Copy Markdown

smk762 commented Jun 27, 2025

Although I am now able to create a connection, I'm not sure I am able to use it correctly.

A few attempts to connect with scanning QR code with Trust wallet failed.

kdf-1  | 27 10:47:22, kdf_walletconnect:451] INFO [8c46238e5da8cad21c92dc7105f836b4938e8b46c3cb00c29b45c27931e343ce] Publishing message=Request(Request { id: MessageId(448261438189834), jsonrpc: "2.0", params: SessionRequest(SessionRequestRequest { request: Request { method: "personal_sign", params: Array [String("0x41757468656e7469636174652077697468204b4446"), String("0xf5c6738ff36b62a032d11c4badeebf088db58e70")], expiry: None }, chain_id: "eip155:1" }) })
kdf-1  | 27 10:47:23, kdf_walletconnect:472] INFO [8c46238e5da8cad21c92dc7105f836b4938e8b46c3cb00c29b45c27931e343ce] Message published successfully
kdf-1  | 27 10:47:24, kdf_walletconnect:323] INFO [8c46238e5da8cad21c92dc7105f836b4938e8b46c3cb00c29b45c27931e343ce] Inbound message payload={"id":448261438189834,"jsonrpc":"2.0","error":{"code":5300,"message":"Invalid Session Properties requested"}}

I did have a successful attempt with ATOM on Keplr though, so might be just a Trust wallet issue. I'll test a few more.

The `await_connection` function has been rewritten to loop until a
`Connected` state is reached, preventing premature failures.

This change re-introduces the `Connecting` state, which is now
properly handled by the new waiting logic.
@shamardy shamardy requested review from mariocynicys and smk762 June 27, 2025 16:35
@shamardy shamardy requested a review from mariocynicys June 30, 2025 08:47
Copy link
Copy Markdown

@smk762 smk762 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed functioning for both EVM and Tendermint, via keplr and metamask wallets.
Had some problems trying to connect with Trust Wallet and Bifrost (which appears to require tesstnet / dev mode).

Tested all wc_methods on legacy/hd/wasm/native and related activations with priv_key_policy using the new object structure

        "priv_key_policy": {
            "type": "WalletConnect",
            "params": {
                "session_topic": "45e250e14bec044dde7b007165608952032421387e6ac36e05654cd1fd175a99"
            }
        }

Some secondary methods to check balance / tx_history were also confirmed to be reporting as expected.

Copy link
Copy Markdown
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.
LGTM!

@shamardy shamardy merged commit 46ab1fb into staging Jun 30, 2025
18 of 24 checks passed
@shamardy shamardy deleted the fix/wc-connection-retry branch June 30, 2025 11:04
dimxy pushed a commit that referenced this pull request Jul 13, 2025
* dev:
  fix(eth): Correctly implement ETH max withdrawal logic (#2531)
  feat(use-clap-for-cli): use clap to parse CLI-Args #2215 (#2510)
  feat(orderbook): expirable maker orders (#2516)
  improvement(eth): drop parity support (#2527)
  chore(release): finalize changelog for v2.5.0-beta (#2524)
  chore(toolchain): upgrade toolchain to nightly 1.86.0 (#2444)
  feat(swap): rpc to find best swap with liquidity routing for ask (#2362)
  fix(kdf_walletconnect): apply explicit MmError mapping (#2514)
  fix(walletconnect): centralize connection and retry logic (#2508)
  fix(hw-wallet): avoid calling `get_enabled_address` in trezor-based coin init (#2504)
dimxy pushed a commit to dimxy/komodo-defi-framework that referenced this pull request Jul 16, 2025
* refactor-gas-fee-policy: (30 commits)
  revert change in test_enable_custom_erc20
  fix(test) disable get_swap_gas_fee_policy for tests (no platform coin in tests)
  revert tokio test and use block_on for eth docker test
  Refactor gas fee code: get max_eth_tx_type from platform coin, add gas fee adjust coin params, rename "set_swap_transaction_fee_policy" rpc to "set_swap_gas_fee_policy", force set_swap_gas_fee_policy to store policy in the platform coin
  fix(eth): Propagate structured EIP-1559 fee errors (GLEECBTC#2532)
  fix(eth): Correctly implement ETH max withdrawal logic (GLEECBTC#2531)
  feat(use-clap-for-cli): use clap to parse CLI-Args GLEECBTC#2215 (GLEECBTC#2510)
  feat(orderbook): expirable maker orders (GLEECBTC#2516)
  improvement(eth): drop parity support (GLEECBTC#2527)
  chore(release): finalize changelog for v2.5.0-beta (GLEECBTC#2524)
  chore(toolchain): upgrade toolchain to nightly 1.86.0 (GLEECBTC#2444)
  feat(swap): rpc to find best swap with liquidity routing for ask (GLEECBTC#2362)
  fix(kdf_walletconnect): apply explicit MmError mapping (GLEECBTC#2514)
  fix(walletconnect): centralize connection and retry logic (GLEECBTC#2508)
  fix(hw-wallet): avoid calling `get_enabled_address` in trezor-based coin init (GLEECBTC#2504)
  chore(core): replace hash_raw_entry with stable entry() API (GLEECBTC#2473)
  chore(core): adapt `MmError` and usages for compatibility with new rustc versions (GLEECBTC#2443)
  feat(wallet): add `delete_wallet` RPC (GLEECBTC#2497)
  chore(release): add changelog entries for v2.5.0-beta (GLEECBTC#2494)
  chore(release): bump kdf version to 2.5.0-beta (GLEECBTC#2492)
  ...

# Conflicts:
#	mm2src/coins/eth.rs
#	mm2src/coins/eth/eth_swap_v2/eth_maker_swap_v2.rs
#	mm2src/coins/eth/eth_swap_v2/eth_taker_swap_v2.rs
#	mm2src/coins/eth/eth_utils.rs
#	mm2src/coins/eth/eth_withdraw.rs
#	mm2src/coins/eth/v2_activation.rs
#	mm2src/coins/lp_coins.rs
#	mm2src/coins/qrc20.rs
#	mm2src/coins/qrc20/swap.rs
#	mm2src/coins_activation/src/platform_coin_with_tokens.rs
#	mm2src/coins_activation/src/token.rs
#	mm2src/mm2_main/src/lp_swap/check_balance.rs
#	mm2src/mm2_main/src/lp_swap/maker_swap.rs
#	mm2src/mm2_main/src/rpc/dispatcher/dispatcher.rs
#	mm2src/mm2_main/src/rpc/lp_commands/ext_api/ext_api_types.rs
#	mm2src/mm2_main/src/rpc/lp_commands/mod.rs
#	mm2src/mm2_main/src/rpc/lp_commands/one_inch/errors.rs
#	mm2src/mm2_main/src/rpc/lp_commands/one_inch/rpcs.rs
#	mm2src/mm2_main/src/rpc/lp_commands/tokens.rs
#	mm2src/mm2_main/tests/docker_tests/eth_docker_tests.rs
#	mm2src/trading_api/src/one_inch_api/classic_swap_types.rs
#	mm2src/trading_api/src/one_inch_api/client.rs
#	mm2src/trading_api/src/one_inch_api/portfolio_types.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority: urgent Critical tasks requiring immediate action. status: pending review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants