Skip to content

[sglang-miles] Cherry-pick #24851: Add routed_experts_start_len for absolute routing slice control#24904

Merged
ByronHsu merged 1 commit into
sglang-milesfrom
byron/cp-24851-to-sglang-miles
May 10, 2026
Merged

[sglang-miles] Cherry-pick #24851: Add routed_experts_start_len for absolute routing slice control#24904
ByronHsu merged 1 commit into
sglang-milesfrom
byron/cp-24851-to-sglang-miles

Conversation

@ByronHsu
Copy link
Copy Markdown
Collaborator

Summary

Cherry-pick of #24851 (merge commit d82e339) onto sglang-miles.

  • Adds routed_experts_start_len: int = 0 across the full request lifecycle (GenerateReqInput, TokenizedGenerateReqInput, OpenAI protocol, Engine, Req, tokenizer_manager, scheduler, session_controller, encode_receiver, serving_chat/completions)
  • Scheduler validates start_len is non-negative and <= prompt_tokens
  • maybe_collect_routed_experts honors start_len, early-returns when return_routed_experts is False, and logs row-count mismatches
  • get_routed_experts gains start_len parameter with defensive clamping
  • Tests added in test_return_routed_experts.py

Conflict resolution:

  • Multiple conflicts due to upstream main evolution (docs_new directory, state_capturer refactoring, indexer_topk module). Resolution:
    • Removed docs_new/ files (directory doesn't exist on sglang-miles)
    • Removed state_capturer/base.py (sglang-miles uses layers/moe/routed_experts_capturer.py)
    • Removed return_indexer_topk / maybe_collect_indexer_topk references (not part of [Session R3] Add routed_experts_start_len for absolute routing slice control #24851, leaked from upstream context)
    • Added start_len parameter directly to sglang-miles get_routed_experts method instead of upstream's get_topk
    • Content conflicts in io_struct, schedule_batch, scheduler, tokenizer_manager, scheduler_output_processor_mixin resolved by taking only the routed_experts_start_len additions

Test plan

  • Existing TestReturnRoutedExperts tests should pass (no regression)
  • New TestRoutedExpertsStartLen tests cover default behavior, row-count correctness, bounds checking, and cache-hit interaction

Made with Cursor

…control (#24851)

Co-authored-by: Byron Hsu <byron@periodiclabs.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: zyzshishui <zyzshishui@gmail.com>
Co-authored-by: Yuzhen Zhou <82826991+zyzshishui@users.noreply.github.com>
(cherry picked from commit d82e339)
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the routed_experts_start_len parameter, allowing users to specify an absolute start position for returned MoE routing information. The changes propagate this parameter through the API protocols, engine entrypoints, scheduling logic, and the expert capturer layer. Review feedback identifies a potential NameError due to a missing logger import in the output processor, inconsistent validation logic in the disaggregation receiver, brittle hardcoded constants in the new tests, and a type hint mismatch in a test utility function.

Comment on lines +130 to +140
logger.warning(
"routed_experts row-count mismatch for req %s: got %d, "
"expected %d (seqlen=%d, cached_tokens=%d, start_len=%s). "
"This indicates a silent bug.",
req.rid,
req.routed_experts.shape[0],
expected_rows,
req.seqlen,
req.cached_tokens,
req.routed_experts_start_len,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logger variable is used here but it does not appear to be imported or defined within this mixin file. This will likely result in a NameError at runtime when a row-count mismatch occurs. Please ensure logger is imported from sglang.srt.utils or defined at the module level.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong. it is above

Comment thread python/sglang/srt/managers/scheduler.py
Comment on lines +38 to +39
_QWEN3_30B_A3B_NUM_LAYERS = 48
_QWEN3_30B_A3B_TOPK = 8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding model-specific constants like _QWEN3_30B_A3B_NUM_LAYERS and _QWEN3_30B_A3B_TOPK makes the test brittle if the default test model changes. It would be better to derive these values from the model configuration or the response metadata to ensure the test remains valid across different MoE architectures.

if hasattr(cls, "process") and cls.process:
kill_process_tree(cls.process.pid)

def _send(self, payload: dict) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for _send indicates it returns a dict, but the implementation returns a requests.Response object. This mismatch can be confusing for static analysis and future maintenance.

Suggested change
def _send(self, payload: dict) -> dict:
def _send(self, payload: dict) -> requests.Response:

@ByronHsu ByronHsu merged commit a929eb7 into sglang-miles May 10, 2026
7 checks passed
@ByronHsu ByronHsu deleted the byron/cp-24851-to-sglang-miles branch May 10, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant