fix(orderbook): validate roots before commit by shamardy · Pull Request #2605 · GLEECBTC/komodo-defi-framework

shamardy · 2025-08-20T20:17:49Z

Background

After the P2P upgrade in feat(network): upgrade p2p layer #1878 (“broadcast one topic at a time”), keep‑alive messages no longer implied deletion when a pair’s root was absent. Combined with a single pubkey‑wide timeout, a seed that missed a cancel could keep an order until the maker went offline.
After fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives #2580, older nodes still didn't clear local pairs on null roots, leading to bad orderbooks and repeated warnings like “Couldn’t find an order … it will be synced upon pubkey keep alive,” and errors such as InvalidStateRoot([0,…]) (see investigate pubkey keep alive warnings in logs #2594).
This PR finalizes the receiver‑side handling for the fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives #2580 change by:
- Validating per‑pair diffs strictly against maker‑advertised roots and clearing pairs on mismatch.
- Tracking per‑pair liveness locally (pair_last_seen_local) and pruning stale pairs, removing a pubkey only when all its pairs are stale.

What this PR changes

Commit-after-validate (per pair)
- On PubkeyKeepAlive, compute expected roots per pair from the maker.
- Request diffs/full tries only from the keep‑alive’s origin peer (propagated_from).
- Ignore unsolicited pairs in the sync response.
- Apply the diff/full trie and recompute the root per pair; commit only if it exactly matches the expected root.
- On mismatch, reject the data, clear that pair locally, and do not propagate the message.
- The same “do not propagate” rule applies to stale/replayed keep‑alives handled as StaleKeepAlive.
- Propagation: forward the keep‑alive only if all requested pairs were validated and applied successfully; if any pair is stale, mismatched, or remains unresolved, the message is not propagated.
Unresolved pairs
- If any requested pairs remain unresolved after the origin‑peer sync, return a SyncFailure (treated as a warning) and do not propagate the message.
Stale/replay protection (per pair)
- Introduces a per‑pair monotonic maker timestamp gate (latest_root_timestamp_by_pair) and a local per‑pair last‑seen clock (pair_last_seen_local).
- Rejects stale keep‑alives with OrderbookP2PHandlerError::StaleKeepAlive.
- Replay guard: we intentionally retain latest_root_timestamp_by_pair entries even after a pair is pruned, to block stale replays of old roots; these entries are dropped only when the entire pubkey state is removed (which is not ideal still).
Receiver‑side handling of empty roots
- If a keep‑alive carries a zero or hashed‑null root for a pair, clear local state for that pair and:
  - Store the null root in trie_roots,
  - Update the per‑pair maker timestamp (latest_root_timestamp_by_pair),
  - Remove the local last‑seen (pair_last_seen_local) so the pair becomes eligible for GC if it stays inactive.
- No sync is attempted for that pair.
Liveness and GC
- Track liveness per pair and prune only stale pairs; remove the pubkey when all its pairs are stale.
- GC now uses the constant MAKER_ORDER_TIMEOUT; the loop no longer reads an override from config.
- GetOrderbookPubkeyItem.last_keep_alive reflects the max local last‑seen across the pubkey’s pairs.
- When the orderbook is filled from relays (GetOrderbook response), we now set pair_last_seen_local for each imported (pubkey, pair), so last_keep_alive and GC behave correctly for imported state.
Errors and logging
- Adds OrderbookP2PHandlerError::StaleKeepAlive and ::SyncFailure and treats SyncFailure as a warning to accommodate outdated peers.
Code cleanup:
- Removed a dead condition in keep‑alive broadcasting where if (root == H64::default()) && (root == hashed_null_node::<Layout>()) can never be true (no functional change)
  - Maybe we should not broadcast such roots and let them timeout on receiver side instead to reduce network load, will think of that.

Why this stops “ghost orders”

We accept per‑pair data only if it reproduces the maker’s expected root.
We time out inactive pairs locally (per pair) using the node’s clock.

Out of scope (future work)

Peer scoring/quarantine for consistently failing relays.

TODO

Maybe we should not broadcast null roots and let them timeout on receiver side instead to reduce network load.
Refactor keepalive logic / code to a different file to start reducing the code in lp_ordermatch.rs. This will be done after reviews and approvals.

addresses #2594

…lag/logs - On non-null roots, sync only if the applied diff/full trie reproduces the maker-advertised expected root; otherwise revert the pair and continue to the next peer. - Walk peers sequentially (origin first); mark pubkeys unsynced on unresolved pairs and log consulted peers.

- `H64::default()`` is `[0;8]`` while `hashed_null_node::<Layout>()` is `Blake2b-8(0x00) ≠ 0`, so the && condition could never be true.

- Remove per-node `maker_order_timeout` override; use `MAKER_ORDER_TIMEOUT` so remote order GC is consistent across the network

mariocynicys

first iteration. a little messy in my head. still attaching strings.

1- could we add more doc comments for new functions.

2- so the previous code used to request the updated orderbook for maker x from the propagator. now we ask the propagator and fall back to the entire mesh if the propagator sends invalid data.
the question is: why would the propagator ever send invalid or not-up-to-date data?!
since they sent/forwarded the keepalive message, doesn't this mean they processed it themselves and they have what we believe is the correct state that we call here expected_roots_by_pair?

i.e. expected_roots_by_pair is directly what the propagator has as their view of the roots for this maker. why could they send invalid data when we ask about the diff we are missing to reach this expected view/roots (that they themselves sent us).

mm2src/mm2_main/src/lp_ordermatch.rs

…fallback

dimxy · 2025-08-21T18:09:06Z

mm2src/mm2_main/src/lp_ordermatch.rs

        .expect("CryptoCtx not available")
        .mm2_internal_pubkey_hex();

-    let maker_order_timeout = ctx.conf["maker_order_timeout"].as_u64().unwrap_or(MAKER_ORDER_TIMEOUT);


Why is maker_order_timeout not used now, to avoid short-lived orders?
I can see this param is used in tests, maybe could be enabled with "for-tests" feature?

Thanks for finding this, there is a test failing due to this removal. Will feature gate this to one of the test features / cfg for sure.

Decided to extend the wait in the failing test to 16 seconds instead here 0f47c68 since the timeout in tests is only 15 seconds.
https://github.com/KomodoPlatform/komodo-defi-framework/blob/e8372dc76d50110d1a90107b83618e6965f1ba21/mm2src/mm2_main/src/lp_ordermatch.rs#L139-L140
https://github.com/KomodoPlatform/komodo-defi-framework/blob/e8372dc76d50110d1a90107b83618e6965f1ba21/mm2src/mm2_main/src/lp_ordermatch.rs#L142

Test still failing. It might be a different problem other than timeout config in tests, will investigate it now.

Ok, mm2_tests_main doesn't obey #[cfg(test)] while for-tests feature requires changes in the CI when running mm2_tests_main, since MIN_ORDER_KEEP_ALIVE_INTERVAL is in the same crate as mm2_tests_main.

84468c2 should allow for-tests to be used inside mm2_main crate while wiring it in the right way. Any integration / docker test now requires for-tests / run-docker-tests feature to be passed, if not passed a compilation error will show. For docker tests ,run-docker-tests includes for-tests as a dependent feature.

I also added an explicit [[test]] targets which allowed running any test from the IDE using the “Run” (play) button to work on mm2_tests_main or docker_tests_main.

The failing test_order_should_not_be_displayed_when_node_is_down should now work by using the test value of MIN_ORDER_KEEP_ALIVE_INTERVAL as for-tests feature was added to it.

…rs instead of aborting

mm2src/mm2_main/src/lp_ordermatch.rs

Track liveness per trading pair using maker‑published latest_root_timestamp_by_pair and local pair_last_seen_local. Keep‑alive processing now requests trie diffs only from the peer that propagated the message (origin‑only sync); if any pairs remain unresolved after syncing, the keep‑alive is not forwarded and a SyncFailure is logged (warn). Apply garbage collection at the pair level based on pair_last_seen_local, and remove the pubkey once all of its pairs are stale. Remove the legacy global last_keep_alive/latest_maker_timestamp fields and the is_synced flag.

…fallback

github-actions · 2025-08-29T12:53:51Z

KDF WASM Playground Previews

d4465fb: https://3b5e93ef.kdf-wasm-playground.pages.dev (Original WASM: 32M, Gzipped WASM: 11M)
4487d41: https://4a521574.kdf-wasm-playground.pages.dev (Original WASM: 32M, Gzipped WASM: 11M)

shamardy · 2025-08-30T01:09:12Z

I guess this can be considered ready for review even though I still plan to optimize some things and improve the logs a bit.

onur-ozkan

This is a rough review from my side. I will need to dig into the details later as I don't know the full context yet.

mm2src/mm2_main/src/lp_ordermatch.rs

shamardy · 2025-09-02T15:50:19Z

I will need to dig into the details later as I don't know the full context yet.

@onur-ozkan the description wasn't updated after the latest changes, done now. This should give you more context.

… 16s.

- for‑tests are now wired through the imported crates and run‑docker‑tests includes it - explicit [[test]] targets are declared with the required features - order keep‑alive interval uses for-tests feature - CI is updated to pass the for-test feature for integration tests

shamardy · 2025-09-03T01:00:07Z

Please note that this commit 84468c2 should be reviewed independent from all others ref. #2605 (comment)

…celled_message` - It worked locally but failed in CI - Now it uses `wait_for_log` instead of just sleeping

…cument behavior

onur-ozkan

I will need to dig into the details later as I don't know the full context yet.

@onur-ozkan the description wasn't updated after the latest changes, done now. This should give you more context.

Thanks!

LGTM other than my previous review.

mariocynicys

another round.

mm2src/mm2_main/src/lp_ordermatch.rs

mm2src/mm2_main/tests/feature_gate_for_tests.rs

mm2src/mm2_main/tests/mm2_tests/mm2_tests_inner.rs

mm2src/mm2_main/src/lp_ordermatch.rs

… of for loop in `build_pubkey_state_sync_request`

…s and extra allocations

…s if a pair is emptied - also remove an old commented out code for orderbook validation

shamardy · 2025-09-04T21:13:16Z

@mariocynicys @dimxy this is ready for another review iteration

mariocynicys

Thanks! Resolved the last comments from the previous PR (after re-reviewing knowing we deal with the propagating seednode and not the originator).

Couple of last nits/questions.

mm2src/mm2_main/tests/mm2_tests/mm2_tests_inner.rs

mm2src/mm2_main/src/lp_ordermatch.rs

…fallback

…should wait 16 secs as this is the timeout for maker orders

mariocynicys

Thanks! LGTM!

This reverts commit c6c3cc1

This reverts commit 9a76d11.

* dev: fix(TPU): correct dexfee in check balance to prevent swap failures (#2600) fix(tests): fix/remove kmd rewards failing test (#2633) chore(ci): bump CI container image to debian bullseye-slim to match dev (#2641) chore(release): add changelog entries for v2.5.2-beta (#2639) chore(release): bump mm2 version to 2.5.2-beta (#2638) feat(ci): add macos universal2 build (#2628) fix(metrics): remove memory_db size metric (#2632) chore(rust 1.90): make CI clippy/fmt pass Revert "fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives (#2580)" Revert "fix(orderbook): validate roots before commit (#2605)"

* dev: fix(TPU): correct dexfee in check balance to prevent swap failures (#2600) fix(tests): fix/remove kmd rewards failing test (#2633) chore(ci): bump CI container image to debian bullseye-slim to match dev (#2641) chore(release): add changelog entries for v2.5.2-beta (#2639) chore(release): bump mm2 version to 2.5.2-beta (#2638) feat(ci): add macos universal2 build (#2628) fix(metrics): remove memory_db size metric (#2632) fix(zcoin): exact-anchor witnesses in wasm get_spendable_notes (#2629) fix(evm-swapv2): no mempool inclusion required for maker payment validation (#2618) chore(rust 1.90): make CI clippy/fmt pass Revert "fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives (#2580)" Revert "fix(orderbook): validate roots before commit (#2605)"

This change does the following: - Validates per‑pair diffs strictly against maker‑advertised roots while clearing pairs on mismatch. - Tracks per‑pair liveness locally using `pair_last_seen_local` while pruning stale pairs, pubkeys are removed only when all its pairs are stale.

This reverts commit c6c3cc1

shamardy changed the title ~~fix(orderbook): harden convergence via commit-after-validate + sequential sync~~ fix(orderbook): validate roots before commit; sequential sync Aug 20, 2025

shamardy force-pushed the hotfix/sync-orderbook-fallback branch 3 times, most recently from fb5b59e to 2581414 Compare August 20, 2025 22:23

shamardy force-pushed the hotfix/sync-orderbook-fallback branch from 2581414 to 5cb899e Compare August 20, 2025 22:40

shamardy added 2 commits August 21, 2025 01:44

remove unreachable null-root guard

b2217f6

- `H64::default()`` is `[0;8]`` while `hashed_null_node::<Layout>()` is `Blake2b-8(0x00) ≠ 0`, so the && condition could never be true.

enforce global maker timeout for remote pubkeys

32f4284

- Remove per-node `maker_order_timeout` override; use `MAKER_ORDER_TIMEOUT` so remote order GC is consistent across the network

shamardy marked this pull request as ready for review August 20, 2025 22:46

shamardy requested a review from cipig August 20, 2025 22:47

shamardy added the status: pending review label Aug 20, 2025

mariocynicys reviewed Aug 21, 2025

View reviewed changes

shamardy added 2 commits August 21, 2025 18:19

wip

db166f2

Merge remote-tracking branch 'origin/dev' into hotfix/sync-orderbook-…

2ab576e

…fallback

dimxy reviewed Aug 21, 2025

View reviewed changes

don't propogate stale PubkeyKeepAlives, log-and-continue on sync erro…

4ffb7f7

…rs instead of aborting

shamardy marked this pull request as draft August 21, 2025 18:44

dimxy reviewed Aug 26, 2025

View reviewed changes

mm2src/mm2_main/src/lp_ordermatch.rs Outdated Show resolved Hide resolved

shamardy force-pushed the hotfix/sync-orderbook-fallback branch from cd8c7fa to e8372dc Compare August 29, 2025 03:28

Merge remote-tracking branch 'origin/dev' into hotfix/sync-orderbook-…

d4465fb

…fallback

shamardy added the deploy: wasm-playground label Aug 29, 2025

github-actions bot deployed to preview August 29, 2025 12:53 View deployment

shamardy marked this pull request as ready for review August 30, 2025 01:08

onur-ozkan reviewed Sep 2, 2025

View reviewed changes

github-actions bot deployed to preview September 2, 2025 18:24 View deployment

shamardy removed the deploy: wasm-playground label Sep 2, 2025

extend test_order_should_not_be_displayed_when_node_is_down wait to…

0f47c68

… 16s.

shamardy force-pushed the hotfix/sync-orderbook-fallback branch from 4487d41 to 0f47c68 Compare September 2, 2025 18:29

shamardy added 2 commits September 3, 2025 04:56

try to fix flaky `set_price_with_cancel_previous_should_broadcast_can…

099012e

…celled_message` - It worked locally but failed in CI - Now it uses `wait_for_log` instead of just sleeping

review fix: return per‑pair trie_roots from process_keep_alive and do…

89edefd

…cument behavior

onur-ozkan previously approved these changes Sep 3, 2025

View reviewed changes

mariocynicys reviewed Sep 3, 2025

View reviewed changes

review fixes: move SyncFailure comment and use filter_map instead…

484852e

… of for loop in `build_pubkey_state_sync_request`

shamardy dismissed onur-ozkan’s stale review via 484852e September 3, 2025 20:33

shamardy added 5 commits September 4, 2025 00:07

review fixes: clarify rate‑limiter comment

b009b53

review fix: remove double iteration while avoiding overlapping borrow…

ef4197c

…s and extra allocations

review fixes: prefer values().max().copied(), use ASCII apostrophe

06341fa

ordermatch: on cancel, advance per‑pair timestamp floor; drop livenes…

575d1d3

…s if a pair is emptied - also remove an old commented out code for orderbook validation

review fixes: last few ones

d5055b3

shamardy changed the title ~~fix(orderbook): validate roots before commit; sequential sync~~ fix(orderbook): validate roots before commit Sep 4, 2025

mariocynicys reviewed Sep 9, 2025

View reviewed changes

mm2src/mm2_main/tests/mm2_tests/mm2_tests_inner.rs Show resolved Hide resolved

mm2src/mm2_main/tests/mm2_tests/mm2_tests_inner.rs Show resolved Hide resolved

mm2src/mm2_main/src/lp_ordermatch.rs Outdated Show resolved Hide resolved

shamardy added 2 commits September 10, 2025 19:41

Merge remote-tracking branch 'origin/dev' into hotfix/sync-orderbook-…

8ef7450

…fallback

review fixes: test_own_orders_should_not_be_removed_from_orderbook …

967c8b8

…should wait 16 secs as this is the timeout for maker orders

mariocynicys approved these changes Sep 10, 2025

View reviewed changes

shamardy merged commit c6c3cc1 into dev Sep 10, 2025
21 of 41 checks passed

shamardy deleted the hotfix/sync-orderbook-fallback branch September 10, 2025 20:30

shamardy added a commit that referenced this pull request Oct 3, 2025

Revert "fix(orderbook): validate roots before commit (#2605)"

9a76d11

This reverts commit c6c3cc1

shamardy added a commit that referenced this pull request Oct 6, 2025

Revert "Revert "fix(orderbook): validate roots before commit (#2605)""

e3ad3ca

This reverts commit 9a76d11.

dimxy pushed a commit that referenced this pull request Oct 15, 2025

Revert "fix(orderbook): validate roots before commit (#2605)"

867bfd8

This reverts commit c6c3cc1

Conversation

shamardy commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

What this PR changes

Why this stops “ghost orders”

Out of scope (future work)

TODO

Uh oh!

mariocynicys left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dimxy Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

shamardy Aug 30, 2025

Choose a reason for hiding this comment

Uh oh!

shamardy Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shamardy Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

shamardy Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shamardy Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

KDF WASM Playground Previews

Uh oh!

shamardy commented Aug 30, 2025

Uh oh!

onur-ozkan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shamardy commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shamardy commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

onur-ozkan left a comment

Choose a reason for hiding this comment

Uh oh!

mariocynicys left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shamardy commented Sep 4, 2025

Uh oh!

mariocynicys left a comment

Choose a reason for hiding this comment

shamardy commented Aug 20, 2025 •

edited

Loading

shamardy Sep 2, 2025 •

edited

Loading

shamardy Sep 2, 2025 •

edited

Loading

shamardy Sep 3, 2025 •

edited

Loading

github-actions bot commented Aug 29, 2025 •

edited

Loading

shamardy commented Sep 2, 2025 •

edited

Loading

shamardy commented Sep 3, 2025 •

edited

Loading