[BugFix] Fix V2 model runner profile_run crash for Omni Talker stages#2819
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
For bug 2 (OmniOutput handling), you now set self._last_aux_output = None for OmniOutput but preserve it for the (tensor, dict) tuple path. Is this intentional? If OmniOutput returns aux_dict, should we store it like the tuple path does? |
|
BLOCKER scan:
OVERALL: NO BLOCKERS VERDICT: COMMENT This is a focused bug fix that addresses four specific issues with the V2 model runner for Omni Talker stages:
The fixes are well-reasoned and the local test results show all 5 test configurations passing successfully. Minor suggestion: Consider adding a regression test for these specific bugs to prevent them from reoccurring. While the existing tests cover the happy path, a targeted test that verifies the fix (e.g., checking that profile_run succeeds for Talker stages with has_preprocess=True) would be valuable. Note: DCO check is currently failing - please add commit sign-off (e.g., |
…tages Fix three V2 model runner bugs that crash during profile_run for Omni Talker stages: 1. Disable encoder_runner MM path for preprocess models — Talker shares the same architecture as Thinker, so multimodal registry marks it supports_mm_inputs=True. But encoder_runner allocates inputs_embeds buffer using hf_text_config.hidden_size (2048) while Talker's actual embedding dim is 1024, causing shape mismatch during dummy_run. 2. Handle OmniOutput (NamedTuple, len=4) in execute_model — the existing tuple intercept only checks len==2, missing OmniOutput and leaving hidden_states as a non-tensor that fails the final assert. 3. Adapt Qwen2.5-Omni get_mrope_input_positions signature for V2 RopeState — V2 calls (input_tokens, mm_features) but the old signature required keyword-only args. Delegate to thinker's implementation which extracts grid data from mm_features. 4. Register Qwen2_5OmniForConditionalGeneration in _OMNI_ARCHITECTURES so V2 uses OmniModelState instead of DefaultModelState. Signed-off-by: Sy03 <1370724210@qq.com>
c614190 to
2c16696
Compare
| # make_omni_output can retrieve them. | ||
| if hasattr(self.model, "_last_captured_layers"): | ||
| self.model._last_captured_layers = second | ||
| hidden_states = first |
There was a problem hiding this comment.
for the hidden state extraction logic can we do like the following to handle last_aux_output?
if isinstance(model_output, tuple) and len(model_output) == 2:
first, second = model_output
if isinstance(first, torch.Tensor):
self._last_aux_output = second
if hasattr(self.model, "_last_captured_layers"):
self.model._last_captured_layers = second
elif isinstance(model_output, OmniOutput) or isinstance(model_output, torch.Tensor):
first = model_output
else:
raise TypeError("Error Message")
hidden_states = first if not isinstance(model_output, OmniOutput) else model_output.text_hidden_states
```bash- Hoist _last_aux_output=None before branches to prevent state leakage - Remove over-defensive isinstance(first, torch.Tensor) inner check - Replace silent fallback with raise TypeError for unexpected output types Signed-off-by: Sy03 <1370724210@qq.com>
PR vllm-project#2819 added Qwen2_5OmniForConditionalGeneration to _OMNI_ARCHITECTURES but did not update the corresponding unit test, causing test_omni_architectures_set_contains_expected to fail on both simple-unit-test and modelrunner-v2-unit-test CI jobs. Signed-off-by: Sy03 <1370724210@qq.com>
PR vllm-project#2819 added Qwen2_5OmniForConditionalGeneration to _OMNI_ARCHITECTURES but did not update the corresponding unit test, causing test_omni_architectures_set_contains_expected to fail on both simple-unit-test and modelrunner-v2-unit-test CI jobs. Signed-off-by: Sy03 <1370724210@qq.com>
Purpose
Fix three V2 model runner bugs that crash during
profile_runfor Omni Talker stages (Qwen3-Omni and Qwen2.5-Omni). These bugs block V2 runner adoption for all Omni models with Talker stages.Reproduces CI failures in buildkite builds #6704 (H100) and #6691 (L4).
Root causes and fixes:
encoder_runner buffer shape mismatch — Talker shares
Qwen3OmniMoeForConditionalGenerationarchitecture with Thinker, so multimodal registry markssupports_mm_inputs=True. encoder_runner allocatesinputs_embedsbuffer usinghf_text_config.hidden_size(2048) but Talker's actual embedding dim is 1024 →RuntimeError: expanded size (2048) must match existing size (1024). Fix: disable MM path for models withhas_preprocess(embeddings injected viarun_preprocess(), not encoder_runner).OmniOutput not handled in execute_model —
OmniOutputis a NamedTuple (len=4), but the tuple intercept only checkslen==2, leavinghidden_statesas a non-tensor →AssertionError. Fix: checkisinstance(OmniOutput)first and extracttext_hidden_states.Qwen2.5-Omni M-RoPE signature mismatch — V2
RopeStatecallsget_mrope_input_positions(input_tokens, mm_features)but the old signature required keyword-only args (hf_config,image_grid_thw, etc.) →TypeError. Fix: new signature accepts all params as optional, delegates tothinker.get_mrope_input_positions()which extracts grid data frommm_featuresviagather_kwargs. Old implementation preserved as_get_mrope_input_positions_v1.Qwen2.5-Omni not in
_OMNI_ARCHITECTURES— V2 fell back toDefaultModelState(norun_preprocess) →AttributeError. Fix: registerQwen2_5OmniForConditionalGeneration.Test Plan
Test Result
All 5 test configurations pass with
returncode=0andProcessed prompts: 100%.V2 runner previously crashed at
profile_runfor all Omni Talker stages; after fix all stages initialize and run inference successfully.cc @Fattysand @tzhouam