[sglang-miles] Cherry-pick #24851: Add routed_experts_start_len for absolute routing slice control#24904
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the routed_experts_start_len parameter, allowing users to specify an absolute start position for returned MoE routing information. The changes propagate this parameter through the API protocols, engine entrypoints, scheduling logic, and the expert capturer layer. Review feedback identifies a potential NameError due to a missing logger import in the output processor, inconsistent validation logic in the disaggregation receiver, brittle hardcoded constants in the new tests, and a type hint mismatch in a test utility function.
| logger.warning( | ||
| "routed_experts row-count mismatch for req %s: got %d, " | ||
| "expected %d (seqlen=%d, cached_tokens=%d, start_len=%s). " | ||
| "This indicates a silent bug.", | ||
| req.rid, | ||
| req.routed_experts.shape[0], | ||
| expected_rows, | ||
| req.seqlen, | ||
| req.cached_tokens, | ||
| req.routed_experts_start_len, | ||
| ) |
There was a problem hiding this comment.
There was a problem hiding this comment.
this is wrong. it is above
| _QWEN3_30B_A3B_NUM_LAYERS = 48 | ||
| _QWEN3_30B_A3B_TOPK = 8 |
There was a problem hiding this comment.
Hardcoding model-specific constants like _QWEN3_30B_A3B_NUM_LAYERS and _QWEN3_30B_A3B_TOPK makes the test brittle if the default test model changes. It would be better to derive these values from the model configuration or the response metadata to ensure the test remains valid across different MoE architectures.
| if hasattr(cls, "process") and cls.process: | ||
| kill_process_tree(cls.process.pid) | ||
|
|
||
| def _send(self, payload: dict) -> dict: |
There was a problem hiding this comment.
Summary
Cherry-pick of #24851 (merge commit
d82e339) ontosglang-miles.routed_experts_start_len: int = 0across the full request lifecycle (GenerateReqInput, TokenizedGenerateReqInput, OpenAI protocol, Engine, Req, tokenizer_manager, scheduler, session_controller, encode_receiver, serving_chat/completions)start_lenis non-negative and <= prompt_tokensmaybe_collect_routed_expertshonorsstart_len, early-returns whenreturn_routed_expertsis False, and logs row-count mismatchesget_routed_expertsgainsstart_lenparameter with defensive clampingtest_return_routed_experts.pyConflict resolution:
docs_new/files (directory doesn't exist on sglang-miles)state_capturer/base.py(sglang-miles useslayers/moe/routed_experts_capturer.py)return_indexer_topk/maybe_collect_indexer_topkreferences (not part of [Session R3] Add routed_experts_start_len for absolute routing slice control #24851, leaked from upstream context)start_lenparameter directly to sglang-milesget_routed_expertsmethod instead of upstream'sget_topkrouted_experts_start_lenadditionsTest plan
TestReturnRoutedExpertstests should pass (no regression)TestRoutedExpertsStartLentests cover default behavior, row-count correctness, bounds checking, and cache-hit interactionMade with Cursor