Skip to content

fix(orderbook): validate roots before commit#2605

Merged
shamardy merged 20 commits intodevfrom
hotfix/sync-orderbook-fallback
Sep 10, 2025
Merged

fix(orderbook): validate roots before commit#2605
shamardy merged 20 commits intodevfrom
hotfix/sync-orderbook-fallback

Conversation

@shamardy
Copy link
Copy Markdown
Collaborator

@shamardy shamardy commented Aug 20, 2025

Background

What this PR changes

  • Commit-after-validate (per pair)
    • On PubkeyKeepAlive, compute expected roots per pair from the maker.
    • Request diffs/full tries only from the keep‑alive’s origin peer (propagated_from).
    • Ignore unsolicited pairs in the sync response.
    • Apply the diff/full trie and recompute the root per pair; commit only if it exactly matches the expected root.
    • On mismatch, reject the data, clear that pair locally, and do not propagate the message.
    • The same “do not propagate” rule applies to stale/replayed keep‑alives handled as StaleKeepAlive.
    • Propagation: forward the keep‑alive only if all requested pairs were validated and applied successfully; if any pair is stale, mismatched, or remains unresolved, the message is not propagated.
  • Unresolved pairs
    • If any requested pairs remain unresolved after the origin‑peer sync, return a SyncFailure (treated as a warning) and do not propagate the message.
  • Stale/replay protection (per pair)
    • Introduces a per‑pair monotonic maker timestamp gate (latest_root_timestamp_by_pair) and a local per‑pair last‑seen clock (pair_last_seen_local).
    • Rejects stale keep‑alives with OrderbookP2PHandlerError::StaleKeepAlive.
    • Replay guard: we intentionally retain latest_root_timestamp_by_pair entries even after a pair is pruned, to block stale replays of old roots; these entries are dropped only when the entire pubkey state is removed (which is not ideal still).
  • Receiver‑side handling of empty roots
    • If a keep‑alive carries a zero or hashed‑null root for a pair, clear local state for that pair and:
      • Store the null root in trie_roots,
      • Update the per‑pair maker timestamp (latest_root_timestamp_by_pair),
      • Remove the local last‑seen (pair_last_seen_local) so the pair becomes eligible for GC if it stays inactive.
    • No sync is attempted for that pair.
  • Liveness and GC
    • Track liveness per pair and prune only stale pairs; remove the pubkey when all its pairs are stale.
    • GC now uses the constant MAKER_ORDER_TIMEOUT; the loop no longer reads an override from config.
    • GetOrderbookPubkeyItem.last_keep_alive reflects the max local last‑seen across the pubkey’s pairs.
    • When the orderbook is filled from relays (GetOrderbook response), we now set pair_last_seen_local for each imported (pubkey, pair), so last_keep_alive and GC behave correctly for imported state.
  • Errors and logging
    • Adds OrderbookP2PHandlerError::StaleKeepAlive and ::SyncFailure and treats SyncFailure as a warning to accommodate outdated peers.
  • Code cleanup:
    • Removed a dead condition in keep‑alive broadcasting where if (root == H64::default()) && (root == hashed_null_node::<Layout>()) can never be true (no functional change)
      • Maybe we should not broadcast such roots and let them timeout on receiver side instead to reduce network load, will think of that.

Why this stops “ghost orders”

  • We accept per‑pair data only if it reproduces the maker’s expected root.
  • We time out inactive pairs locally (per pair) using the node’s clock.

Out of scope (future work)

  • Peer scoring/quarantine for consistently failing relays.

TODO

  • Maybe we should not broadcast null roots and let them timeout on receiver side instead to reduce network load.
  • Refactor keepalive logic / code to a different file to start reducing the code in lp_ordermatch.rs. This will be done after reviews and approvals.

addresses #2594

@shamardy shamardy changed the title fix(orderbook): harden convergence via commit-after-validate + sequential sync fix(orderbook): validate roots before commit; sequential sync Aug 20, 2025
@shamardy shamardy force-pushed the hotfix/sync-orderbook-fallback branch 3 times, most recently from fb5b59e to 2581414 Compare August 20, 2025 22:23
…lag/logs

- On non-null roots, sync only if the applied diff/full trie reproduces the maker-advertised expected root; otherwise revert the pair and continue to the next peer.
- Walk peers sequentially (origin first); mark pubkeys unsynced on unresolved pairs and log consulted peers.
@shamardy shamardy force-pushed the hotfix/sync-orderbook-fallback branch from 2581414 to 5cb899e Compare August 20, 2025 22:40
- `H64::default()`` is `[0;8]`` while `hashed_null_node::<Layout>()` is `Blake2b-8(0x00) ≠ 0`, so the && condition could never be true.
- Remove per-node `maker_order_timeout` override; use `MAKER_ORDER_TIMEOUT` so remote order GC is consistent across the network
@shamardy shamardy marked this pull request as ready for review August 20, 2025 22:46
@shamardy shamardy requested a review from cipig August 20, 2025 22:47
Copy link
Copy Markdown
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first iteration. a little messy in my head. still attaching strings.

1- could we add more doc comments for new functions.

2- so the previous code used to request the updated orderbook for maker x from the propagator. now we ask the propagator and fall back to the entire mesh if the propagator sends invalid data.
the question is: why would the propagator ever send invalid or not-up-to-date data?!
since they sent/forwarded the keepalive message, doesn't this mean they processed it themselves and they have what we believe is the correct state that we call here expected_roots_by_pair?

i.e. expected_roots_by_pair is directly what the propagator has as their view of the roots for this maker. why could they send invalid data when we ask about the diff we are missing to reach this expected view/roots (that they themselves sent us).

.expect("CryptoCtx not available")
.mm2_internal_pubkey_hex();

let maker_order_timeout = ctx.conf["maker_order_timeout"].as_u64().unwrap_or(MAKER_ORDER_TIMEOUT);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is maker_order_timeout not used now, to avoid short-lived orders?
I can see this param is used in tests, maybe could be enabled with "for-tests" feature?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this, there is a test failing due to this removal. Will feature gate this to one of the test features / cfg for sure.

Copy link
Copy Markdown
Collaborator Author

@shamardy shamardy Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test still failing. It might be a different problem other than timeout config in tests, will investigate it now.

Copy link
Copy Markdown
Collaborator Author

@shamardy shamardy Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, mm2_tests_main doesn't obey #[cfg(test)] while for-tests feature requires changes in the CI when running mm2_tests_main, since MIN_ORDER_KEEP_ALIVE_INTERVAL is in the same crate as mm2_tests_main.

Copy link
Copy Markdown
Collaborator Author

@shamardy shamardy Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

84468c2 should allow for-tests to be used inside mm2_main crate while wiring it in the right way. Any integration / docker test now requires for-tests / run-docker-tests feature to be passed, if not passed a compilation error will show. For docker tests ,run-docker-tests includes for-tests as a dependent feature.

I also added an explicit [[test]] targets which allowed running any test from the IDE using the “Run” (play) button to work on mm2_tests_main or docker_tests_main.

The failing test_order_should_not_be_displayed_when_node_is_down should now work by using the test value of MIN_ORDER_KEEP_ALIVE_INTERVAL as for-tests feature was added to it.

@shamardy shamardy marked this pull request as draft August 21, 2025 18:44
Track liveness per trading pair using maker‑published latest_root_timestamp_by_pair and local pair_last_seen_local. Keep‑alive processing now requests trie diffs only from the peer that propagated the message (origin‑only sync); if any pairs remain unresolved after syncing, the keep‑alive is not forwarded and a SyncFailure is logged (warn). Apply garbage collection at the pair level based on pair_last_seen_local, and remove the pubkey once all of its pairs are stale. Remove the legacy global last_keep_alive/latest_maker_timestamp fields and the is_synced flag.
@shamardy shamardy force-pushed the hotfix/sync-orderbook-fallback branch from cd8c7fa to e8372dc Compare August 29, 2025 03:28
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 29, 2025

KDF WASM Playground Previews

d4465fb: https://3b5e93ef.kdf-wasm-playground.pages.dev (Original WASM: 32M, Gzipped WASM: 11M)
4487d41: https://4a521574.kdf-wasm-playground.pages.dev (Original WASM: 32M, Gzipped WASM: 11M)

@shamardy shamardy marked this pull request as ready for review August 30, 2025 01:08
@shamardy
Copy link
Copy Markdown
Collaborator Author

I guess this can be considered ready for review even though I still plan to optimize some things and improve the logs a bit.

Copy link
Copy Markdown

@onur-ozkan onur-ozkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a rough review from my side. I will need to dig into the details later as I don't know the full context yet.

@shamardy
Copy link
Copy Markdown
Collaborator Author

shamardy commented Sep 2, 2025

I will need to dig into the details later as I don't know the full context yet.

@onur-ozkan the description wasn't updated after the latest changes, done now. This should give you more context.

@shamardy shamardy force-pushed the hotfix/sync-orderbook-fallback branch from 4487d41 to 0f47c68 Compare September 2, 2025 18:29
- for‑tests are now wired through the imported crates and  run‑docker‑tests includes it
- explicit [[test]] targets are declared with the required features
- order keep‑alive interval uses for-tests feature
- CI is updated to pass the for-test feature for integration tests
@shamardy
Copy link
Copy Markdown
Collaborator Author

shamardy commented Sep 3, 2025

Please note that this commit 84468c2 should be reviewed independent from all others ref. #2605 (comment)

…celled_message`

- It worked locally but failed in CI
- Now it uses `wait_for_log` instead of just sleeping
onur-ozkan
onur-ozkan previously approved these changes Sep 3, 2025
Copy link
Copy Markdown

@onur-ozkan onur-ozkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need to dig into the details later as I don't know the full context yet.

@onur-ozkan the description wasn't updated after the latest changes, done now. This should give you more context.

Thanks!

LGTM other than my previous review.

Copy link
Copy Markdown
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another round.

… of for loop in `build_pubkey_state_sync_request`
@shamardy
Copy link
Copy Markdown
Collaborator Author

shamardy commented Sep 4, 2025

@mariocynicys @dimxy this is ready for another review iteration

@shamardy shamardy changed the title fix(orderbook): validate roots before commit; sequential sync fix(orderbook): validate roots before commit Sep 4, 2025
Copy link
Copy Markdown
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Resolved the last comments from the previous PR (after re-reviewing knowing we deal with the propagating seednode and not the originator).

Couple of last nits/questions.

Copy link
Copy Markdown
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

@shamardy shamardy merged commit c6c3cc1 into dev Sep 10, 2025
21 of 41 checks passed
@shamardy shamardy deleted the hotfix/sync-orderbook-fallback branch September 10, 2025 20:30
shamardy added a commit that referenced this pull request Oct 3, 2025
shamardy added a commit that referenced this pull request Oct 6, 2025
dimxy pushed a commit that referenced this pull request Oct 8, 2025
* dev:
  fix(TPU): correct dexfee in check balance to prevent swap failures (#2600)
  fix(tests): fix/remove kmd rewards failing test (#2633)
  chore(ci): bump CI container image to debian bullseye-slim to match dev (#2641)
  chore(release): add changelog entries for v2.5.2-beta (#2639)
  chore(release): bump mm2 version to 2.5.2-beta (#2638)
  feat(ci): add macos universal2 build (#2628)
  fix(metrics): remove memory_db size metric (#2632)
  chore(rust 1.90): make CI clippy/fmt pass
  Revert "fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives (#2580)"
  Revert "fix(orderbook): validate roots before commit (#2605)"
dimxy pushed a commit that referenced this pull request Oct 9, 2025
* dev:
  fix(TPU): correct dexfee in check balance to prevent swap failures (#2600)
  fix(tests): fix/remove kmd rewards failing test (#2633)
  chore(ci): bump CI container image to debian bullseye-slim to match dev (#2641)
  chore(release): add changelog entries for v2.5.2-beta (#2639)
  chore(release): bump mm2 version to 2.5.2-beta (#2638)
  feat(ci): add macos universal2 build (#2628)
  fix(metrics): remove memory_db size metric (#2632)
  fix(zcoin): exact-anchor witnesses in wasm get_spendable_notes (#2629)
  fix(evm-swapv2): no mempool inclusion required for maker payment validation (#2618)
  chore(rust 1.90): make CI clippy/fmt pass
  Revert "fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives (#2580)"
  Revert "fix(orderbook): validate roots before commit (#2605)"
dimxy pushed a commit that referenced this pull request Oct 15, 2025
This change does the following:
- Validates per‑pair diffs strictly against maker‑advertised roots while clearing pairs on mismatch.
- Tracks per‑pair liveness locally using `pair_last_seen_local` while pruning stale pairs, pubkeys are removed only when all its pairs are stale.
dimxy pushed a commit that referenced this pull request Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants