Skip to content

test(router): cover round-robin unset dp-rank flow#7991

Merged
PeaBrane merged 9 commits intomainfrom
rupei/round-robin-dp-rank-test
Apr 8, 2026
Merged

test(router): cover round-robin unset dp-rank flow#7991
PeaBrane merged 9 commits intomainfrom
rupei/round-robin-dp-rank-test

Conversation

@PeaBrane
Copy link
Copy Markdown
Contributor

@PeaBrane PeaBrane commented Apr 8, 2026

Add a disagg e2e test that verifies round-robin prefill DP-rank selection across bootstrap and non-bootstrap paths. Teach the mocker to round-robin unset dp_rank so the test exercises the same behavior the real engine uses.

Summary by CodeRabbit

Release Notes

  • New Features

    • Improved round-robin DP rank assignment for distributed prefill operations, enabling better load balancing across nodes.
  • Tests

    • Added comprehensive E2E tests for disaggregated round-robin prefill routing with multiple configuration scenarios and registration orders.
    • Expanded test coverage for KV cache distribution validation across DP ranks.
  • Chores

    • Enhanced test portability by using system Python interpreter instead of hardcoded paths.

ishandhanani and others added 6 commits April 8, 2026 07:20
…print

PrefillRouter::query_prefill_worker returns Option<u32> for dp_rank.
The C FFI wrapper was declaring u32, causing E0308 in clippy. Map None
to u32::MAX (NO_DP_RANK sentinel) so the Python side sees _DP_RANK_UNSET.
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane requested review from a team as code owners April 8, 2026 15:15
@github-actions github-actions bot added the test label Apr 8, 2026
Base automatically changed from ishan/rip-bruh to main April 8, 2026 15:47
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Walkthrough

Modified MockEngine to assign DP ranks via atomic counter-based round-robin when not explicitly provided in requests. Added router test infrastructure including a helper function for verifying disaggregated prefill DP rank distribution and a parameterized E2E test. Updated Python process invocations to use sys.executable instead of hardcoded interpreter paths.

Changes

Cohort / File(s) Summary
Engine DP rank resolution
lib/llm/src/mocker.rs
Added unset_dp_rank_counter atomic field and resolve_dp_rank() method to assign pseudo-round-robin DP ranks to requests lacking explicit routing. Updated AsyncEngine::generate to use the new resolver.
Router test infrastructure
tests/router/common.py
Added _test_router_decisions_disagg_round_robin_prefill_dp_rank() helper that verifies KV cache distribution across DP ranks by sending multiple requests and comparing stored block count deltas per rank.
Router process initialization
tests/router/router_process.py
Replaced hardcoded python3 with sys.executable in FrontendRouterProcess and DirectRouterProcess initialization to use the active interpreter.
Router E2E testing
tests/router/test_router_e2e_with_mockers.py
Added sys.executable usage in mocker command building. Imported and integrated new test helper. Added parameterized E2E test test_router_decisions_disagg_round_robin_prefill_dp_rank covering registration order and bootstrap scenarios with separate prefill/decode mockers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description is concise and covers the key intent, but it lacks the structured format (Overview, Details, Where to start, Related Issues) specified in the repository template. Restructure the description using the template sections: Overview, Details, Where should the reviewer start, and Related Issues to improve clarity and consistency.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding test coverage for round-robin DP-rank selection when dp-rank is unset, which aligns with the actual changes across mocker and test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/router/test_router_e2e_with_mockers.py (1)

1283-1375: Extract the disagg worker launch matrix into a shared helper.

This repeats the same registration_order branching and nested DisaggMockerProcess setup as test_router_decisions_disagg, so the two E2E paths can drift independently. A small helper/fixture that yields (prefill_workers, decode_workers) would keep this logic in one place.

As per coding guidelines, "Do not copy-paste test infrastructure; reuse and refactor shared setup logic into fixtures or tests/utils/."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/router/test_router_e2e_with_mockers.py` around lines 1283 - 1375, The
test duplicates the disaggregated worker launch matrix (registration_order
branching and nested DisaggMockerProcess usage) already present in
test_router_decisions; refactor by extracting that setup into a shared
helper/fixture (e.g., a fixture in tests/utils or a helper function) that yields
(prefill_workers, decode_workers) and accepts params like namespace,
prefill_mocker_args, decode_mocker_args, registration_order, request_plane, and
enable_bootstrap; update
test_router_decisions_disagg_round_robin_prefill_dp_rank and
test_router_decisions_disagg to call the new helper/fixture instead of repeating
the DisaggMockerProcess nesting so both tests reuse the same launch logic and
avoid divergence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/router/common.py`:
- Around line 1814-1898: The test can race because wait_for_frontend_ready()
only checks /v1/models; add a warm-up loop to exercise the frontend round-robin
path before measuring DP-rank deltas: after you verify worker_ids (using
client.instance_ids()) and before taking baseline_counts via
observer_router.dump_events(), perform a small number of dummy chat POSTs to
chat_url (at least prefill_workers.num_workers requests, using the same minimal
payload used later) and await their responses (with short sleeps) so the
frontend finishes discovering the prefill pool and round-robins to all prefill
workers.

---

Nitpick comments:
In `@tests/router/test_router_e2e_with_mockers.py`:
- Around line 1283-1375: The test duplicates the disaggregated worker launch
matrix (registration_order branching and nested DisaggMockerProcess usage)
already present in test_router_decisions; refactor by extracting that setup into
a shared helper/fixture (e.g., a fixture in tests/utils or a helper function)
that yields (prefill_workers, decode_workers) and accepts params like namespace,
prefill_mocker_args, decode_mocker_args, registration_order, request_plane, and
enable_bootstrap; update
test_router_decisions_disagg_round_robin_prefill_dp_rank and
test_router_decisions_disagg to call the new helper/fixture instead of repeating
the DisaggMockerProcess nesting so both tests reuse the same launch logic and
avoid divergence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ba95e2c9-e3aa-417c-a244-957d17793e8c

📥 Commits

Reviewing files that changed from the base of the PR and between a210efa and 67993ef.

📒 Files selected for processing (4)
  • lib/llm/src/mocker.rs
  • tests/router/common.py
  • tests/router/router_process.py
  • tests/router/test_router_e2e_with_mockers.py

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane enabled auto-merge (squash) April 8, 2026 17:50
@PeaBrane PeaBrane merged commit c09abf7 into main Apr 8, 2026
95 checks passed
@PeaBrane PeaBrane deleted the rupei/round-robin-dp-rank-test branch April 8, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants