feat(mocker): add offline disagg replay by PeaBrane · Pull Request #7617 · ai-dynamo/dynamo

PeaBrane · 2026-03-25T01:49:33Z

Add an offline disaggregated replay path with separate prefill/decode worker pools, staged Python/CLI bindings, and replay docs updates. This also keeps the aggregated replay path intact while extending tests around disagg routing, metrics, and timing behavior.

Summary by CodeRabbit

Release Notes

New Features
- Added router configuration flag to track or ignore prompt-side prefill tokens in load accounting (--router-track-prefill-tokens / --no-router-track-prefill-tokens).
- Added support for offline disaggregated replay mode with separate prefill and decode engine configuration and worker counts.
Documentation
- Updated router and replay documentation with new CLI options and disaggregated mode usage guidance.

Signed-off-by: PeaBrane <yanrpei@gmail.com>

github-actions · 2026-03-25T01:51:00Z

🌿 Fern Docs Preview: https://nvidia-preview-0112bfc3-47c8-4269-b6ac-244cef269c19.docs.buildwithfern.com/dynamo/dev

coderabbitai · 2026-03-25T02:05:03Z

Walkthrough

This pull request introduces a new router_track_prefill_tokens boolean configuration flag (default: true) to control whether prompt-side prefill tokens are included in the router's active load accounting. The feature is integrated through configuration, CLI, routing, scheduling, and replay paths. Additionally, new offline disaggregated ("disagg") replay infrastructure is added, supporting separate prefill and decode worker pools with independent engines and routers.

Changes

Cohort / File(s)	Summary
Configuration & CLI `components/src/dynamo/common/configuration/groups/kv_router_args.py`, `components/src/dynamo/router/__main__.py`	Added `router_track_prefill_tokens` to KV router configuration exports and CLI flag `--router-track-prefill-tokens` with environment variable `DYN_ROUTER_TRACK_PREFILL_TOKENS` (default true); integrated into startup logging.
Router Documentation `components/src/dynamo/router/README.md`, `docs/components/router/README.md`, `docs/components/router/router-guide.md`	Documented new CLI flag `--no-router-track-prefill-tokens` with usage guidance for decode-only routing paths; updated disaggregated serving characteristics to include `track_prefill_tokens=false`.
Replay Documentation `docs/benchmarks/mocker-trace-replay.md`, `docs/mocker/mocker.md`	Added new disaggregated offline replay CLI options (`--prefill-engine-args`, `--decode-engine-args`, `--num-prefill-workers`, `--num-decode-workers`) with configuration constraints and examples.
Routing & Scheduling Core `lib/kv-router/src/scheduling/config.rs`, `lib/kv-router/src/scheduling/local.rs`, `lib/kv-router/src/scheduling/types.rs`, `lib/llm/src/kv_router.rs`, `lib/llm/src/kv_router/scheduler.rs`, `lib/llm/src/kv_router/prefill_router.rs`	Added `router_track_prefill_tokens` field to `KvRouterConfig` and `RouterConfigOverride` with helper methods; integrated into `LocalScheduler` and `KvScheduler` to control load estimation via new `track_prefill_tokens` parameter; decode router overrides now disable prefill token tracking.
Sequence Tracking `lib/kv-router/src/sequences/multi_worker.rs`, `lib/kv-router/src/sequences/single.rs`, `lib/kv-router/src/protocols.rs`, `lib/kv-router/src/scheduling/queue.rs`, `lib/kv-router/src/scheduling/policy.rs`	Added `track_prefill_tokens: bool` field to `SequenceRequest` and `ActiveSequenceEventData::AddRequest`; introduced `add_request_with_prefill_tracking(...)` and `potential_blocks_and_tokens_with_prefill_tracking(...)` methods to conditionally account for prefill load.
Offline Disaggregated Replay Core `lib/mocker/src/replay/offline/disagg.rs`, `lib/mocker/src/replay/offline/events.rs`, `lib/mocker/src/replay/offline/runtime_utils.rs`, `lib/mocker/src/replay/offline/mod.rs`, `lib/mocker/src/replay/offline/README.md`	Implemented new offline disaggregated replay simulator with separate prefill/decode worker pools, dual KV routers, staged event processing, and request state management; added `SimulationWorkerStage` enum and `WorkerCompletionPayload` for multi-stage event handling; added shared timing and admission utilities.
Replay Infrastructure & Validation `lib/mocker/src/replay/entrypoints.rs`, `lib/mocker/src/replay/mod.rs`, `lib/mocker/src/replay/validate.rs`, `lib/mocker/src/replay/router/offline.rs`, `lib/mocker/src/replay/router/online.rs`, `lib/mocker/src/replay/router/shared.rs`, `lib/mocker/src/replay/offline/state.rs`, `lib/mocker/src/replay/offline/multi.rs`	Added `OfflineDisaggReplayConfig` struct and disaggregated entrypoints; introduced `validate_offline_disagg_replay_args` and `validate_offline_disagg_concurrency_args` validators; integrated `track_prefill_tokens` into offline replay router construction; refactored multi-worker runtime to use shared timing/event utilities; added `execute_hidden_pass` methods to worker state and engine core.
Scheduler Engine Implementations `lib/mocker/src/scheduler/mod.rs`, `lib/mocker/src/scheduler/vllm/core.rs`, `lib/mocker/src/scheduler/sglang/core.rs`	Added `execute_hidden_pass(...)` method to `EngineCore` and engine implementations (vLLM, SGLang) for hidden pass execution without trace collection.
C & Python Bindings `lib/bindings/c/src/lib.rs`, `lib/bindings/python/rust/llm/entrypoint.rs`	Exposed `router_track_prefill_tokens` in C and Python `KvRouterConfig` constructors; C bindings read environment variable `DYN_ROUTER_TRACK_PREFILL_TOKENS` and include in decode router overrides.
Python Replay API `lib/bindings/python/rust/llm/replay.rs`, `lib/bindings/python/src/dynamo/replay/api.py`, `lib/bindings/python/src/dynamo/replay/main.py`	Extended `run_mocker_trace_replay` and `run_mocker_synthetic_trace_replay` signatures with `prefill_engine_args`, `decode_engine_args`, `num_prefill_workers`, `num_decode_workers`; introduced `load_replay_args_selection` to branch on aggregated vs. disaggregated mode; added CLI parsing for new disaggregated engine and worker-count parameters.
Benchmark & Test Updates `lib/bench/kv_router/active_sequences_bench.rs`, `lib/bindings/python/tests/test_replay.py`	Set `track_prefill_tokens: true` in benchmark sequence requests; added disaggregated replay tests covering offline trace/synthetic replay, error conditions, and CLI subprocess smoke test with disaggregated worker configuration.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description provided is brief but missing key structure from the template. It lacks an 'Overview' section, detailed 'Details' breakdown, 'Where should the reviewer start' guidance, and 'Related Issues' reference.	Expand the description to include all template sections: Overview (high-level purpose), Details (breakdown of changes), Where should the reviewer start (key files), and Related Issues (GitHub issue number if applicable).
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.17% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(mocker): add offline disagg replay' clearly describes the main feature being added—an offline disaggregated replay capability. It follows conventional commit format and is specific and concise.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/mocker/src/replay/router/offline.rs (1)

374-397: ⚠️ Potential issue | 🟠 Major

track_prefill_tokens is still ignored during offline admission.

SequenceRequest now carries the flag, but admit_request() still computes candidate load with slots.potential_blocks_and_tokens(...), which always includes prompt-side tokens. That means offline replay keeps charging prompt-side load during worker selection even when router_track_prefill_tokens=false (notably the decode pool in disagg mode). This path needs the same potential_blocks_and_tokens_with_prefill_tracking(..., request.track_prefill_tokens) change that lib/kv-router/src/scheduling/queue.rs got.

Suggested fix

 fn admit_request(&mut self, request: PendingRequest) -> Result<usize> {
-    let (decode_blocks, prefill_tokens) = self.slots.potential_blocks_and_tokens(
-        request.token_seq.as_deref(),
-        request.isl_tokens,
-        request.overlaps.clone(),
-    );
+    let (decode_blocks, prefill_tokens) = self
+        .slots
+        .potential_blocks_and_tokens_with_prefill_tracking(
+            request.token_seq.as_deref(),
+            request.isl_tokens,
+            request.overlaps.clone(),
+            request.track_prefill_tokens,
+        );
     let scheduling_request = request.scheduling_request(decode_blocks, prefill_tokens);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/replay/router/offline.rs` around lines 374 - 397, The offline
admit_request path currently calls self.slots.potential_blocks_and_tokens(...)
which always counts prompt-side (prefill) tokens; change that call to use the
prefill-aware API
self.slots.potential_blocks_and_tokens_with_prefill_tracking(...,
request.track_prefill_tokens) so worker selection respects the
SequenceRequest.track_prefill_tokens flag. Update the call site in admit_request
(where decode_blocks and prefill_tokens are computed) to pass
request.track_prefill_tokens into
potential_blocks_and_tokens_with_prefill_tracking and keep the rest of the
scheduling flow (scheduling_request, selector.select_worker, SequenceRequest
construction) unchanged.

🧹 Nitpick comments (3)

lib/mocker/src/replay/offline/runtime_utils.rs (1)

79-102: Refutable pattern match on single-variant enum may break if new event kinds are added.

The let binding on lines 88-94 uses a refutable pattern that assumes SimulationEventKind::WorkerCompletion is the only variant. If SimulationEventKind gains additional variants in the future, this will panic at runtime instead of producing a compile error.

Consider using match with an explicit exhaustive arm or adding a #[deny(irrefutable_let_patterns)] check if this assumption should remain stable.

♻️ Optional: Use explicit match for future-proofing

-    let SimulationEventKind::WorkerCompletion {
-        stage,
-        worker_idx,
-        completed_requests,
-        output_signals,
-        kv_events,
-    } = event.kind;
+    let (stage, worker_idx, completed_requests, output_signals, kv_events) = match event.kind {
+        SimulationEventKind::WorkerCompletion {
+            stage,
+            worker_idx,
+            completed_requests,
+            output_signals,
+            kv_events,
+        } => (stage, worker_idx, completed_requests, output_signals, kv_events),
+    };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/mocker/src/replay/offline/runtime_utils.rs` around lines 79 - 102, The
current refutable pattern in pop_ready_worker_completion destructures event.kind
as SimulationEventKind::WorkerCompletion and can panic if new variants are
added; change the code to explicitly match event.kind (from the SimulationEvent
returned by events.peek()/events.pop()) and handle the WorkerCompletion arm by
constructing and returning WorkerCompletionPayload, while adding a wildcard arm
that safely returns None (or logs/handles unexpected variants) to avoid runtime
panics.

lib/kv-router/src/protocols.rs (1)

1030-1056: Add a legacy-payload test, not just a round-trip.

This only proves false survives when the field is present. The compatibility contract from the new serde default is that older AddRequest payloads without track_prefill_tokens still deserialize to true, so please lock that in with a fixture that omits the field.

Example regression test

+    #[test]
+    fn test_active_sequence_add_request_defaults_track_prefill_tokens_for_legacy_payloads() {
+        let legacy = r#"{"request_id":"req-123","worker":{"worker_id":7,"dp_rank":0},"data":{"AddRequest":{"token_sequence":[11,22],"isl":128,"overlap":1,"expected_output_tokens":32}},"router_id":9,"lora_name":null}"#;
+        let deserialized: ActiveSequenceEvent = serde_json::from_str(legacy).unwrap();
+
+        match deserialized.data {
+            ActiveSequenceEventData::AddRequest {
+                track_prefill_tokens,
+                ..
+            } => assert!(track_prefill_tokens),
+            _ => panic!("expected add request event"),
+        }
+    }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/kv-router/src/protocols.rs` around lines 1030 - 1056, The current test
only round-trips an event that includes track_prefill_tokens=false; add a
regression test that deserializes a legacy JSON payload which omits the
track_prefill_tokens field and asserts the resulting ActiveSequenceEvent
(specifically ActiveSequenceEventData::AddRequest) yields track_prefill_tokens
== true. Create a new test (or extend
test_active_sequence_add_request_serialization_preserves_track_prefill_tokens)
that constructs a JSON string representing an AddRequest without the
track_prefill_tokens key, calls serde_json::from_str::<ActiveSequenceEvent>(),
matches on ActiveSequenceEventData::AddRequest and asserts track_prefill_tokens
is true to lock in the backward-compat default behavior.

lib/bindings/python/tests/test_replay.py (1)

593-614: Consider adding a @pytest.mark.timeout decorator.

This test exercises the full synthetic disagg replay pipeline, which involves worker execution. While the speedup ratio is high (1000.0), adding a timeout guard would be consistent with the subprocess tests and prevent CI hangs if something goes wrong.
🛡️ Suggested timeout decorator
+@pytest.mark.timeout(30)
 def test_run_synthetic_trace_replay_disagg_preserves_expected_output_tokens():
     report = run_synthetic_trace_replay(
As per coding guidelines: "add @pytest.mark.timeout() for any test that may exceed 30s or uses polling/sleeps/subprocess waits"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/bindings/python/tests/test_replay.py` around lines 593 - 614, Add a
pytest timeout decorator to the
test_run_synthetic_trace_replay_disagg_preserves_expected_output_tokens function
to prevent CI hangs; annotate the function with `@pytest.mark.timeout`(30) (or
another appropriate seconds value) placed immediately above its def, and ensure
pytest is imported in the test file if not already so the decorator resolves.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/benchmarks/mocker-trace-replay.md`:
- Around line 171-178: The docs currently present staged disagg args (flags
--prefill-engine-args and --decode-engine-args and staged JSON fields) as
conveniences; update the text to state them as required for offline disagg
replay and document the validator constraints: each --prefill-engine-args JSON
must include worker_type="prefill" and each --decode-engine-args JSON must
include worker_type="decode", both staged configs must set the same block_size,
and pool sizes are controlled via --num-prefill-workers and
--num-decode-workers; mention these requirements where the staged args are
described (references: flags --prefill-engine-args, --decode-engine-args, fields
worker_type and block_size, and flags --num-prefill-workers /
--num-decode-workers) so users won’t hit validation errors.

In `@docs/mocker/mocker.md`:
- Around line 131-134: The AIC guidance and any references to dynamo.replay must
be updated to mention split engine args for disaggregated offline replay: when
using `--replay-mode offline` with disagg you should use `--prefill-engine-args`
and `--decode-engine-args` (plus `--num-prefill-workers` and
`--num-decode-workers`) instead of `--extra-engine-args`; update the AIC section
text to conditionally show the aggregated example using `--extra-engine-args`
and a separate disagg example showing `--prefill-engine-args`,
`--decode-engine-args`, and the worker flags, and change any instructions that
currently tell `dynamo.replay` users to only use `--extra-engine-args` to
include the disagg path and flags.

In `@lib/bindings/c/src/lib.rs`:
- Around line 489-492: The bookkeeping path is not receiving the disagg decode
RouterConfigOverride, so decode_router.add_request is called with None and the
assume_kv_reuse/track_prefill_tokens overrides never reach
KvRouter::add_request; fix by creating or reusing the same RouterConfigOverride
(e.g., RouterConfigOverride { overlap_score_weight: Some(0.0), assume_kv_reuse:
Some(false), track_prefill_tokens: Some(false) }) used for query-time and pass
it into the bookkeeping call instead of None (i.e., replace the None argument to
decode_router.add_request(...) with Some(router_config_override) or propagate
the existing variable) so KvRouter::add_request sees the override.

In `@lib/bindings/python/rust/llm/replay.rs`:
- Around line 642-644: The code accepts num_prefill_workers and
num_decode_workers but ignores them when the replay path stays aggregated; add a
validation in the function that constructs/dispatches the replay (the function
handling num_workers, num_prefill_workers, num_decode_workers in replay.rs) to
reject (return Err or panic with a clear message) any non-default
num_prefill_workers or num_decode_workers when prefill_engine_args and
decode_engine_args are not provided (i.e., when choosing the aggregated arm); do
the same guard where the parameters are later processed (the block around the
alternative arm handling at the section covering lines ~656-673) to ensure
callers are informed rather than silently ignored.

In `@lib/bindings/python/src/dynamo/replay/main.py`:
- Around line 21-48: In _load_engine_args, json.loads(raw_args) can produce
non-dict values (e.g., list, null, scalar) so add an explicit validation right
after raw = json.loads(raw_args) that checks isinstance(raw, dict) and raises a
ValueError (e.g., "engine-args must be a JSON object") if not; keep all
subsequent logic that reads keys from raw (worker_type handling,
planner_profile_data resolution via resolve_planner_profile_data, and final
return via MockEngineArgs.from_json) unchanged.

---

Outside diff comments:
In `@lib/mocker/src/replay/router/offline.rs`:
- Around line 374-397: The offline admit_request path currently calls
self.slots.potential_blocks_and_tokens(...) which always counts prompt-side
(prefill) tokens; change that call to use the prefill-aware API
self.slots.potential_blocks_and_tokens_with_prefill_tracking(...,
request.track_prefill_tokens) so worker selection respects the
SequenceRequest.track_prefill_tokens flag. Update the call site in admit_request
(where decode_blocks and prefill_tokens are computed) to pass
request.track_prefill_tokens into
potential_blocks_and_tokens_with_prefill_tracking and keep the rest of the
scheduling flow (scheduling_request, selector.select_worker, SequenceRequest
construction) unchanged.

---

Nitpick comments:
In `@lib/bindings/python/tests/test_replay.py`:
- Around line 593-614: Add a pytest timeout decorator to the
test_run_synthetic_trace_replay_disagg_preserves_expected_output_tokens function
to prevent CI hangs; annotate the function with `@pytest.mark.timeout`(30) (or
another appropriate seconds value) placed immediately above its def, and ensure
pytest is imported in the test file if not already so the decorator resolves.

In `@lib/kv-router/src/protocols.rs`:
- Around line 1030-1056: The current test only round-trips an event that
includes track_prefill_tokens=false; add a regression test that deserializes a
legacy JSON payload which omits the track_prefill_tokens field and asserts the
resulting ActiveSequenceEvent (specifically ActiveSequenceEventData::AddRequest)
yields track_prefill_tokens == true. Create a new test (or extend
test_active_sequence_add_request_serialization_preserves_track_prefill_tokens)
that constructs a JSON string representing an AddRequest without the
track_prefill_tokens key, calls serde_json::from_str::<ActiveSequenceEvent>(),
matches on ActiveSequenceEventData::AddRequest and asserts track_prefill_tokens
is true to lock in the backward-compat default behavior.

In `@lib/mocker/src/replay/offline/runtime_utils.rs`:
- Around line 79-102: The current refutable pattern in
pop_ready_worker_completion destructures event.kind as
SimulationEventKind::WorkerCompletion and can panic if new variants are added;
change the code to explicitly match event.kind (from the SimulationEvent
returned by events.peek()/events.pop()) and handle the WorkerCompletion arm by
constructing and returning WorkerCompletionPayload, while adding a wildcard arm
that safely returns None (or logs/handles unexpected variants) to avoid runtime
panics.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6378df3c-ba91-45aa-a8c8-1143db98f0a9

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe37a5 and b4d281c.

📒 Files selected for processing (42)

components/src/dynamo/common/configuration/groups/kv_router_args.py
components/src/dynamo/router/README.md
components/src/dynamo/router/__main__.py
docs/benchmarks/mocker-trace-replay.md
docs/components/router/README.md
docs/components/router/router-guide.md
docs/mocker/mocker.md
lib/bench/kv_router/active_sequences_bench.rs
lib/bindings/c/src/lib.rs
lib/bindings/python/rust/llm/entrypoint.rs
lib/bindings/python/rust/llm/replay.rs
lib/bindings/python/src/dynamo/replay/api.py
lib/bindings/python/src/dynamo/replay/main.py
lib/bindings/python/tests/test_replay.py
lib/kv-router/src/protocols.rs
lib/kv-router/src/scheduling/config.rs
lib/kv-router/src/scheduling/local.rs
lib/kv-router/src/scheduling/policy.rs
lib/kv-router/src/scheduling/queue.rs
lib/kv-router/src/scheduling/types.rs
lib/kv-router/src/sequences/multi_worker.rs
lib/kv-router/src/sequences/single.rs
lib/llm/src/kv_router.rs
lib/llm/src/kv_router/prefill_router.rs
lib/llm/src/kv_router/scheduler.rs
lib/llm/src/kv_router/sequence.rs
lib/mocker/src/replay/entrypoints.rs
lib/mocker/src/replay/mod.rs
lib/mocker/src/replay/offline/README.md
lib/mocker/src/replay/offline/disagg.rs
lib/mocker/src/replay/offline/events.rs
lib/mocker/src/replay/offline/mod.rs
lib/mocker/src/replay/offline/multi.rs
lib/mocker/src/replay/offline/runtime_utils.rs
lib/mocker/src/replay/offline/state.rs
lib/mocker/src/replay/router/offline.rs
lib/mocker/src/replay/router/online.rs
lib/mocker/src/replay/router/shared.rs
lib/mocker/src/replay/validate.rs
lib/mocker/src/scheduler/mod.rs
lib/mocker/src/scheduler/sglang/core.rs
lib/mocker/src/scheduler/vllm/core.rs

docs/benchmarks/mocker-trace-replay.md

docs/mocker/mocker.md

lib/bindings/c/src/lib.rs

lib/bindings/python/rust/llm/replay.rs

lib/bindings/python/src/dynamo/replay/main.py

Signed-off-by: PeaBrane <yanrpei@gmail.com>

This reverts commit 6bd84f8.

PeaBrane added 4 commits March 24, 2026 17:32

fix(router): stop charging decode for old prompts

8522585

Signed-off-by: PeaBrane <yanrpei@gmail.com>

Merge origin/main and keep tracker sane

3561be4

Signed-off-by: PeaBrane <yanrpei@gmail.com>

refactor replay timing without moving the clock

42dd884

Signed-off-by: PeaBrane <yanrpei@gmail.com>

feat: teach replay to disagg without drama

b4d281c

Signed-off-by: PeaBrane <yanrpei@gmail.com>

PeaBrane requested review from a team as code owners March 25, 2026 01:49

pull-request-size bot added the size/XXL label Mar 25, 2026

github-actions bot added feat documentation Improvements or additions to documentation router Relates to routing, KV-aware routing, etc. labels Mar 25, 2026

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

merge main and keep replay honest

a9dd460

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 03:27 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 03:28 Inactive

fix: tighten replay guardrails and docs

b03b956

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 03:37 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 03:38 Inactive

fix: centralize replay validation before it forks

e41b87e

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 03:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 03:48 Inactive

feat: let disagg replay stop kv cling

e0f4342

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 04:07 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 04:11 Inactive

cache replay turns and sort stats once

d85a264

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 04:26 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 04:27 Inactive

docs: clarify free subsumes prefill cleanup

00e5b26

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 06:25 Inactive

merge mocker config tests before pytest riots

d88451b

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 07:17 Inactive

test replay planner profile paths

10bfc3f

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 08:12 Inactive

merge main without reviving flat prefill router

61c4585

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 15:56 Inactive

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 16:05 Inactive

PeaBrane disabled auto-merge March 25, 2026 17:05

model decode handoff without prefill fibs

02e4b1e

Signed-off-by: PeaBrane <yanrpei@gmail.com>

PeaBrane enabled auto-merge (squash) March 25, 2026 17:07

PeaBrane disabled auto-merge March 25, 2026 17:08

refactor replay state without moving the clock

77ac5e6

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 17:30 Inactive

PeaBrane enabled auto-merge (squash) March 25, 2026 17:44

PeaBrane disabled auto-merge March 25, 2026 17:46

batch online trace arrivals before passes bolt

6bd84f8

Signed-off-by: PeaBrane <yanrpei@gmail.com>

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 17:58 Inactive

Revert "batch online trace arrivals before passes bolt"

d09a1fc

This reverts commit 6bd84f8.

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 18:02 Inactive

PeaBrane enabled auto-merge (squash) March 25, 2026 18:02

copy-pr-bot bot temporarily deployed to GITLAB March 25, 2026 18:07 Inactive

PeaBrane merged commit 02b1c58 into main Mar 25, 2026
89 checks passed

PeaBrane deleted the rupei/disagg-replay-prep branch March 25, 2026 18:53

ishandhanani mentioned this pull request Apr 8, 2026

fix: dp_rank always 0 in non-KV router mode #7984

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mocker): add offline disagg replay#7617

feat(mocker): add offline disagg replay#7617
PeaBrane merged 28 commits intomainfrom
rupei/disagg-replay-prep

PeaBrane commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PeaBrane commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

PeaBrane commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading