fix(orderbook): validate roots before commit#2605
Conversation
fb5b59e to
2581414
Compare
…lag/logs - On non-null roots, sync only if the applied diff/full trie reproduces the maker-advertised expected root; otherwise revert the pair and continue to the next peer. - Walk peers sequentially (origin first); mark pubkeys unsynced on unresolved pairs and log consulted peers.
2581414 to
5cb899e
Compare
- `H64::default()`` is `[0;8]`` while `hashed_null_node::<Layout>()` is `Blake2b-8(0x00) ≠ 0`, so the && condition could never be true.
- Remove per-node `maker_order_timeout` override; use `MAKER_ORDER_TIMEOUT` so remote order GC is consistent across the network
mariocynicys
left a comment
There was a problem hiding this comment.
first iteration. a little messy in my head. still attaching strings.
1- could we add more doc comments for new functions.
2- so the previous code used to request the updated orderbook for maker x from the propagator. now we ask the propagator and fall back to the entire mesh if the propagator sends invalid data.
the question is: why would the propagator ever send invalid or not-up-to-date data?!
since they sent/forwarded the keepalive message, doesn't this mean they processed it themselves and they have what we believe is the correct state that we call here expected_roots_by_pair?
i.e. expected_roots_by_pair is directly what the propagator has as their view of the roots for this maker. why could they send invalid data when we ask about the diff we are missing to reach this expected view/roots (that they themselves sent us).
| .expect("CryptoCtx not available") | ||
| .mm2_internal_pubkey_hex(); | ||
|
|
||
| let maker_order_timeout = ctx.conf["maker_order_timeout"].as_u64().unwrap_or(MAKER_ORDER_TIMEOUT); |
There was a problem hiding this comment.
Why is maker_order_timeout not used now, to avoid short-lived orders?
I can see this param is used in tests, maybe could be enabled with "for-tests" feature?
There was a problem hiding this comment.
Thanks for finding this, there is a test failing due to this removal. Will feature gate this to one of the test features / cfg for sure.
There was a problem hiding this comment.
Decided to extend the wait in the failing test to 16 seconds instead here 0f47c68 since the timeout in tests is only 15 seconds.
https://github.com/KomodoPlatform/komodo-defi-framework/blob/e8372dc76d50110d1a90107b83618e6965f1ba21/mm2src/mm2_main/src/lp_ordermatch.rs#L139-L140
https://github.com/KomodoPlatform/komodo-defi-framework/blob/e8372dc76d50110d1a90107b83618e6965f1ba21/mm2src/mm2_main/src/lp_ordermatch.rs#L142
There was a problem hiding this comment.
Test still failing. It might be a different problem other than timeout config in tests, will investigate it now.
There was a problem hiding this comment.
Ok, mm2_tests_main doesn't obey #[cfg(test)] while for-tests feature requires changes in the CI when running mm2_tests_main, since MIN_ORDER_KEEP_ALIVE_INTERVAL is in the same crate as mm2_tests_main.
There was a problem hiding this comment.
84468c2 should allow for-tests to be used inside mm2_main crate while wiring it in the right way. Any integration / docker test now requires for-tests / run-docker-tests feature to be passed, if not passed a compilation error will show. For docker tests ,run-docker-tests includes for-tests as a dependent feature.
I also added an explicit [[test]] targets which allowed running any test from the IDE using the “Run” (play) button to work on mm2_tests_main or docker_tests_main.
The failing test_order_should_not_be_displayed_when_node_is_down should now work by using the test value of MIN_ORDER_KEEP_ALIVE_INTERVAL as for-tests feature was added to it.
…rs instead of aborting
Track liveness per trading pair using maker‑published latest_root_timestamp_by_pair and local pair_last_seen_local. Keep‑alive processing now requests trie diffs only from the peer that propagated the message (origin‑only sync); if any pairs remain unresolved after syncing, the keep‑alive is not forwarded and a SyncFailure is logged (warn). Apply garbage collection at the pair level based on pair_last_seen_local, and remove the pubkey once all of its pairs are stale. Remove the legacy global last_keep_alive/latest_maker_timestamp fields and the is_synced flag.
cd8c7fa to
e8372dc
Compare
KDF WASM Playground Previews
|
|
I guess this can be considered ready for review even though I still plan to optimize some things and improve the logs a bit. |
onur-ozkan
left a comment
There was a problem hiding this comment.
This is a rough review from my side. I will need to dig into the details later as I don't know the full context yet.
@onur-ozkan the description wasn't updated after the latest changes, done now. This should give you more context. |
4487d41 to
0f47c68
Compare
- for‑tests are now wired through the imported crates and run‑docker‑tests includes it - explicit [[test]] targets are declared with the required features - order keep‑alive interval uses for-tests feature - CI is updated to pass the for-test feature for integration tests
|
Please note that this commit 84468c2 should be reviewed independent from all others ref. #2605 (comment) |
…celled_message` - It worked locally but failed in CI - Now it uses `wait_for_log` instead of just sleeping
onur-ozkan
left a comment
There was a problem hiding this comment.
I will need to dig into the details later as I don't know the full context yet.
@onur-ozkan the description wasn't updated after the latest changes, done now. This should give you more context.
Thanks!
LGTM other than my previous review.
… of for loop in `build_pubkey_state_sync_request`
…s and extra allocations
…s if a pair is emptied - also remove an old commented out code for orderbook validation
|
@mariocynicys @dimxy this is ready for another review iteration |
mariocynicys
left a comment
There was a problem hiding this comment.
Thanks! Resolved the last comments from the previous PR (after re-reviewing knowing we deal with the propagating seednode and not the originator).
Couple of last nits/questions.
…should wait 16 secs as this is the timeout for maker orders
* dev: fix(TPU): correct dexfee in check balance to prevent swap failures (#2600) fix(tests): fix/remove kmd rewards failing test (#2633) chore(ci): bump CI container image to debian bullseye-slim to match dev (#2641) chore(release): add changelog entries for v2.5.2-beta (#2639) chore(release): bump mm2 version to 2.5.2-beta (#2638) feat(ci): add macos universal2 build (#2628) fix(metrics): remove memory_db size metric (#2632) chore(rust 1.90): make CI clippy/fmt pass Revert "fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives (#2580)" Revert "fix(orderbook): validate roots before commit (#2605)"
* dev: fix(TPU): correct dexfee in check balance to prevent swap failures (#2600) fix(tests): fix/remove kmd rewards failing test (#2633) chore(ci): bump CI container image to debian bullseye-slim to match dev (#2641) chore(release): add changelog entries for v2.5.2-beta (#2639) chore(release): bump mm2 version to 2.5.2-beta (#2638) feat(ci): add macos universal2 build (#2628) fix(metrics): remove memory_db size metric (#2632) fix(zcoin): exact-anchor witnesses in wasm get_spendable_notes (#2629) fix(evm-swapv2): no mempool inclusion required for maker payment validation (#2618) chore(rust 1.90): make CI clippy/fmt pass Revert "fix(ordermatch): ignore loop-back; clear on null root; reject stale keep-alives (#2580)" Revert "fix(orderbook): validate roots before commit (#2605)"
This change does the following: - Validates per‑pair diffs strictly against maker‑advertised roots while clearing pairs on mismatch. - Tracks per‑pair liveness locally using `pair_last_seen_local` while pruning stale pairs, pubkeys are removed only when all its pairs are stale.
Background
InvalidStateRoot([0,…])(see investigate pubkey keep alive warnings in logs #2594).pair_last_seen_local) and pruning stale pairs, removing a pubkey only when all its pairs are stale.What this PR changes
PubkeyKeepAlive, compute expected roots per pair from the maker.propagated_from).StaleKeepAlive.SyncFailure(treated as a warning) and do not propagate the message.latest_root_timestamp_by_pair) and a local per‑pair last‑seen clock (pair_last_seen_local).OrderbookP2PHandlerError::StaleKeepAlive.latest_root_timestamp_by_pairentries even after a pair is pruned, to block stale replays of old roots; these entries are dropped only when the entire pubkey state is removed (which is not ideal still).trie_roots,latest_root_timestamp_by_pair),pair_last_seen_local) so the pair becomes eligible for GC if it stays inactive.MAKER_ORDER_TIMEOUT; the loop no longer reads an override from config.GetOrderbookPubkeyItem.last_keep_alivereflects the max local last‑seen across the pubkey’s pairs.GetOrderbookresponse), we now setpair_last_seen_localfor each imported (pubkey, pair), solast_keep_aliveand GC behave correctly for imported state.OrderbookP2PHandlerError::StaleKeepAliveand::SyncFailureand treatsSyncFailureas a warning to accommodate outdated peers.if (root == H64::default()) && (root == hashed_null_node::<Layout>())can never be true (no functional change)Why this stops “ghost orders”
Out of scope (future work)
TODO
lp_ordermatch.rs. This will be done after reviews and approvals.addresses #2594