Skip to content

fix(wasm): unify error handling for mm2_main#2389

Merged
shamardy merged 15 commits intodevfrom
fix-wasm-exit-codes
Apr 14, 2025
Merged

fix(wasm): unify error handling for mm2_main#2389
shamardy merged 15 commits intodevfrom
fix-wasm-exit-codes

Conversation

@shamardy
Copy link
Copy Markdown
Collaborator

There was a significant inconsistency in how startup failures are handled between the native and WASM implementations of mm2_main:

  • In the native implementation, startup failures return appropriate error codes as i8 values
  • In the wasm implementation, startup failures are only logged but not propagated back to JavaScript

This inconsistency made it difficult for web GUI to reliably detect when mm2_main failed to start, which is particularly problematic when users were trying to log in. While native GUI can get immediate feedback on failure, web GUI had to use timeouts or other workarounds.

Changes

This PR fixes the inconsistency by:

  1. Making mm2_main in wasm return a Promise (async fn with Result<i8, JsValue>)
  2. Creating a unified StartupErrorCode enum shared between native and WASM
pub enum StartupErrorCode {
    /// Operation completed successfully
    Ok = 0,
    /// Invalid parameters were provided to the function
    InvalidParams = 1,
    /// The configuration was invalid (missing required fields, etc.)
    ConfigError = 2,
    /// MM2 is already running
    AlreadyRunning = 3,
    /// MM2 initialization failed
    InitError = 4,
    /// Failed to spawn the MM2 process/thread
    SpawnError = 5,
}
  1. Introducing a structured StartupError type with both code and descriptive message for the wasm implementation, allowing JavaScript to receive detailed error information.
  2. Properly propagating errors from lp_main as Promise rejections in the wasm implementation instead of just logging them.

Architecture Improvement

As part of this fix, we've separated the previously combined initialization and runtime functionality into two distinct phases:

  • lp_main: Handles initialization and configuration, returns ctx when successful
  • lp_run: Takes ctx from lp_main and manages runtime execution

This separation provides cleaner error boundaries, letting us properly propagate initialization errors before moving to the runtime phase. In the wasm implementation, we now resolve the Promise when initialization succeeds and spawn the runtime as a separate task, allowing JavaScript to immediately receive startup success/failure feedback.

This change not only fixes the error reporting inconsistency but also improves the overall architecture by better separating concerns between initialization and runtime execution.

fixes #2383

Copy link
Copy Markdown

@CharlVS CharlVS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shamardy Thank you. I've tested the new wasm build, and it works reliably and provides much quicker feedback to the user than our previous approach.

With my testing of iOS today, I've realised we face the same issue for library builds where there's no reliable way for us to tell the result of the process and when it ended. This didn't come up because I hadn't tested failed login cases on iOS before today. Unfortunately, we can't ship iOS (even as a temporary measure) with the KDF executable bc of Apple's restrictions.

Native executable builds are different because Dart provides a callback for when the process has ended, and we can listen to the log output for that specific process to determine why it ended. In hindsight, KDF startup error handling has always been uniform across native vs wasm, but it's because of the non-uniform limitations on process handling of exe vs lib vs wasm.

Would it be a significant request to make the native function async with similar behaviour to the wasm build? Is that even possible since it's an exported C function? Alternatively, we could add an optional log callback argument to mm2_main, but this might not be ideal for you since it means an inconsistent mm2_main signature between native vs wasm.

@shamardy
Copy link
Copy Markdown
Collaborator Author

Would it be a significant request to make the native function async with similar behaviour to the wasm build? Is that even possible since it's an exported C function?

I don't think that is possible, but I will look into it.

Alternatively, we could add an optional log callback argument to mm2_main, but this might not be ideal for you since it means an inconsistent mm2_main signature between native vs wasm.

Will look into that and other alternatives. But I will do that in a different PR after this is reviewed and merged by @KomodoPlatform/mm2 team.

CharlVS
CharlVS previously approved these changes Mar 17, 2025
Copy link
Copy Markdown

@CharlVS CharlVS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TYSM! Working perfectly!

@CharlVS
Copy link
Copy Markdown

CharlVS commented Mar 17, 2025

@shamardy I mentioned in my bug report a suggestion about adding a dedicated SSE for the KDF lifecycle, but this may be overly complicated, and a simple, more sustainable solution is to have a callback parameter in mm2_main for the same data.

Comment on lines +114 to +116
pub enum StartupErrorCode {
/// Operation completed successfully
Ok = 0,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to call this StartupResultCode instead of StartupErrorCode as it also includes success code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or remove this Ok variant all together and return () for success instead of i8.

Copy link
Copy Markdown

@onur-ozkan onur-ozkan Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

() is a Rust specific placeholder where 0 is known as a success code for any other languages.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean remove the Ok variant all together from this enum.
the () suggestion is about the other functions in which we return Result<i8, Error>. there is no point of returning i8 in the success variant since it's already the Ok variant of the Result (unless we have different success types, which we currently don't).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, what I am saying is that 0 is useful when converting things from other languages (which is the reason why we use this type).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the () suggestion is about the other functions in which we return Result<i8, Error>. there is no point of returning i8 in the success variant since it's already the Ok variant of the Result (unless we have different success types, which we currently don't).

I opted for returning 0 from wasm (javascript) function to allow GUIs to handle the success case the same way if possible. I would ask for @CharlVS opinion on this and which would be better for GUIs.

I will rename this to StartupResultCode for now

Copy link
Copy Markdown
Collaborator Author

@shamardy shamardy Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename this to StartupResultCode for now

done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to keep this type compatible with FFI, you should keep 0 inside of it.

Comment on lines 72 to 75
if let Err(true) = LP_MAIN_RUNNING.compare_exchange(false, true, Ordering::Relaxed, Ordering::Relaxed) {
log!("lp_main already started!");
return;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we propagate this as an error back to the caller? do we?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done f61bac7

Comment on lines +82 to +87
match block_on(mm2_main::lp_main(
params,
&ctx_cb,
KDF_VERSION.into(),
KDF_DATETIME.into(),
)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also errors from lp_main call

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done f61bac7
I propagated them all as init errors for now, we might need new variants for lp_main errors later to propagate back additional better error codes.

onur-ozkan
onur-ozkan previously approved these changes Mar 19, 2025
Copy link
Copy Markdown

@onur-ozkan onur-ozkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CharlVS
CharlVS previously approved these changes Mar 20, 2025
Copy link
Copy Markdown

@CharlVS CharlVS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit was tested to work as expected, and there have been no regressions since my previous review.

@smk762 smk762 self-requested a review April 8, 2025 10:40
@smk762
Copy link
Copy Markdown

smk762 commented Apr 8, 2025

I've tested to trigger a launch fails, which returned exit code 1: InvalidParams in both wasm and cli for the following cases (cli logs for reference):

  • 08 10:39:37, mm2:315] mm2:388] Password can't contain the word password
  • 08 10:41:55, mm2:315] mm2:345] Couldn't parse mm2 config to JSON format!
  • 08 10:45:56, mm2:315] mm2:388] mm2:170] lp_native_dex:435] lp_native_dex:629] Error deserializing 'seednodes' config field: invalid type: string "46.4.87.18", expected a sequence

I was unable to trigger exit code 2: ConfigError, even with nothing but {} in my MM2.json

When attempting to run a second instance of ./kdf with the same MM2.json file, I got error code: 101, not the expected Error code 3: AlreadyRunning.

Unsure how to trigger the other errors, let me know if it is simple enough. I recall there used to be a simulate_panic rpc - does this still exist? or did I dream it? If not, might be useful to mimic launch fails with force_code param.

@shamardy
Copy link
Copy Markdown
Collaborator Author

shamardy commented Apr 10, 2025

I've tested to trigger a launch fails, which returned exit code 1: InvalidParams in both wasm and cli for the following cases (cli logs for reference):

08 10:39:37, mm2:315] mm2:388] Password can't contain the word password
08 10:41:55, mm2:315] mm2:345] Couldn't parse mm2 config to JSON format!
08 10:45:56, mm2:315] mm2:388] mm2:170] lp_native_dex:435] lp_native_dex:629] Error deserializing 'seednodes' config field: invalid type: string "46.4.87.18", expected a sequence

Was this using mm2_main from kdf as a lib or from starting the binary? This PR is for mm2_main provided by kdf as a lib.

I was unable to trigger exit code 2: ConfigError, even with nothing but {} in my MM2.json

ConfigError can be triggered when coins is null in wasm, or if config is not json in native. I believe If I did this #2389 (comment) which might be a big change, it can result in totally different error variants.

When attempting to run a second instance of ./kdf with the same MM2.json file, I got error code: 101, not the expected Error code 3: AlreadyRunning.

Can you provide more info on this, I want to know whereerror code: 101 originated from.

Unsure how to trigger the other errors, let me know if it is simple enough.

for InitError, lp_main has to fail, it should have been triggered by the password case for instance. Did you try it in wasm?

I recall there used to be a simulate_panic rpc - does this still exist? or did I dream it? If not, might be useful to mimic launch fails with force_code param.

simulate_panic was removed here #2270 , I don't think this actually tested any real word scenario of actually running mm2/kdf.

Edit: Maybe fixing this #2389 (comment) and propgating errors back to caller will fix your issues @smk762 as I see we log only the errors of lp_main. But for wasm it should have worked.

- Remove unnecessary Result return type from lp_run
- Add channel-based communication for initialization status
- Add configurable startup timeout with default of 60 seconds
@shamardy
Copy link
Copy Markdown
Collaborator Author

shamardy commented Apr 10, 2025

can you please try again @smk762 for native, if the issue was native only. I also added a new config parameter called startup_timeout that defaults to 60 seconds, this parameter controls how long the main thread will wait for KDF initialization to complete before returning an error. This is an indirect necessity to propagating back initialization errors when they happen like the password one.

Edit: startup_timeout is removed ref. #2389 (comment)

Copy link
Copy Markdown
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@smk762
Copy link
Copy Markdown

smk762 commented Apr 14, 2025

Was this using mm2_main from kdf as a lib or from starting the binary?

it was from binary. I'll setup to retest this as lib

Comment on lines +115 to +116
log!("Failed to recover context in thread: {:?}", err);
return;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could panic here instead to trigger an erroneous log from the catch_unwind instead of "MM2 thread completed normally"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@shamardy shamardy merged commit a8db94b into dev Apr 14, 2025
24 checks passed
@shamardy shamardy deleted the fix-wasm-exit-codes branch April 14, 2025 23:19
dimxy pushed a commit to dimxy/komodo-defi-framework that referenced this pull request May 13, 2025
* dev: (26 commits)
  chore(deps): remove base58 and replace it completely with bs58 (GLEECBTC#2427)
  feat(tron): initial groundwork for full TRON integration (GLEECBTC#2425)
  fix(UTXO): improve tx fee calculation and min relay fee handling (GLEECBTC#2316)
  deps(timed-map): bump to 1.3.1 (GLEECBTC#2413)
  improvement(tendermint): safer IBC channel handler (GLEECBTC#2298)
  chore(release): complete v2.4.0-beta changelogs  (GLEECBTC#2436)
  fix(event-streaming): initial addresses registration in utxo balance streaming (GLEECBTC#2431)
  improvement(watchers): re-write use-watchers handling (GLEECBTC#2430)
  fix(evm): make withdraw_nft work in HD mode (GLEECBTC#2424)
  feat(taproot): support parsing taproot output address types
  chore(RPC): use consistent param name for QTUM delegation (GLEECBTC#2419)
  fix(makerbot): add LiveCoinWatch price provider (GLEECBTC#2416)
  chore(release): add changelog entries for v2.4.0-beta (GLEECBTC#2415)
  fix(wallets): prevent path traversal in `wallet_file_path` and update file extension (GLEECBTC#2400)
  fix(nft): make `update_nft` work with hd wallets using the enabled address (GLEECBTC#2386)
  fix(wasm): unify error handling for mm2_main (GLEECBTC#2389)
  fix(tx-history): token information and query (GLEECBTC#2404)
  test(electrums): fix failing test_one_unavailable_electrum_proto_version (GLEECBTC#2412)
  improvement(network): remove static IPs from seed lists (GLEECBTC#2407)
  improvement(best-orders): return an rpc error when we can't find best orders (GLEECBTC#2318)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants