Skip to content

[Bugfix] Sync main into dev/migrate-MR-v2 and fix build errors#2923

Merged
tzhouam merged 26 commits intovllm-project:dev/migrate-MR-v2from
Sy0307:fix/v2-sync-main-and-build-7230-bugs
Apr 20, 2026
Merged

[Bugfix] Sync main into dev/migrate-MR-v2 and fix build errors#2923
tzhouam merged 26 commits intovllm-project:dev/migrate-MR-v2from
Sy0307:fix/v2-sync-main-and-build-7230-bugs

Conversation

@Sy0307
Copy link
Copy Markdown
Contributor

@Sy0307 Sy0307 commented Apr 19, 2026

Background

Build #7230 on dev/migrate-MR-v2 surfaced three regressions, and the branch had also drifted behind main by 17 commits. This PR does both: sync main into the branch and fix the three failures.

Changes

1. Merge origin/main into dev/migrate-MR-v2

Resolves:

  • Import conflict in vllm_omni/core/sched/omni_generation_scheduler.py (HEAD added import os, main added from __future__ import annotations — keep both).
  • Modify/delete of five vllm_omni/model_executor/stage_configs/*.yaml files. Main migrated them to vllm_omni/deploy/*.yaml under PR [Config Refactor][2/N] Pipeline + Deploy Config Schema #2383's schema refactor. Dev's only change to those files was adding stop_token_ids / detokenize to default_sampling_params; those values are carried over into the new deploy/qwen3_tts.yaml and deploy/qwen3_omni_moe.yaml.

This pull in also resolves CI failure #5 (simple-unit-test: stage_configs property has no setter) — PR #2884 on main already fixed the FakeAsyncOmniClass fixture.

2. [BugFix] Add Qwen2_5Omni to test_init_model_state expected set

Fixes CI failure #3 (modelrunner-v2-unit-test: test_omni_architectures_set_contains_expected). The expected set hardcoded in the test was out of sync with _OMNI_ARCHITECTURES.

3. [BugFix] Fix MTP buffer size mismatch for Omni Talker models

Fixes CI failure #2 (full-moon-omni-star-doc-test-with-h100: size of tensor a (2048) must match size of tensor b (1024)).

OmniModelState allocated its MTP and static inputs_embeds buffers with self.inputs_embeds_size (= hf_text_config.hidden_size = 2048 for Qwen3-Omni Thinker). But Talker stages replace embed_tokens with codec_embedding whose output dim is 1024. Probe the real dim once at init via model.embed_input_ids(dummy) and use that for buffer allocation.

4. [BugFix] Propagate finished_req_ids for already_finished_reqs

Fixes CI failure #4 (qwen3-tts-base-e2e-test-modelrunner-v2: Orchestrator thread crashed / No free indices).

The already_finished_reqs branch in OmniGenerationScheduler.schedule() only removed requests from the running queue but never added them to self.finished_req_ids. So the worker never got the finished signal, never released the corresponding req_state slots, and the next new request hit AssertionError: No free indices in req_states.add_request. Propagate to both self.finished_req_ids and self.finished_req_ids_dict to match upstream _free_request behavior.

Test Plan

  • tests/worker_v2/test_init_model_state.py — 5/5 passed
  • tests/core/sched/test_generation_scheduler_{finish_condition,restore}.py — 11/11 passed
  • tests/worker_v2/ tests/core/sched/ full — 101/101 passed
  • Qwen3-TTS 0.6B server + 5 concurrent /v1/audio/speech requests — all OK
  • Qwen3-TTS 0.6B server + 3×10 concurrent stress — 30/30 OK, no regressions
  • Full CI (wait for build)

CI Failure Mapping

Build #7230 job Root cause Fix in this PR
full-moon-diffusion-x2v-star-accuracy-test pre-existing flaky test Already skipped by PR #2883 on main (merged in)
full-moon-omni-star-doc-test-with-h100 MTP buffer 2048 vs 1024 Commit 3 (MTP)
modelrunner-v2-unit-test test expected set stale Commit 2 (test)
qwen3-tts-base-e2e-test-modelrunner-v2 No free indices race Commit 4 (scheduler)
simple-unit-test stage_configs @property setter Fixed on main by #2884 (merged in)

yenuo26 and others added 26 commits April 17, 2026 23:10
…_generates_video[wan22_i2v_usp2_hsdp2] (vllm-project#2883)

Signed-off-by: wangyu <410167048@qq.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
…t#2343)

Signed-off-by: Nick Cao <ncao@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
…ures (vllm-project#1837)

Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Signed-off-by: linyueqian <linyueqian@outlook.com>
Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com>
Co-authored-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>
Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: hsliuustc0106 <liuhongsheng4@huawei.com>
Signed-off-by: hsliu <liuhongsheng4@huawei.com>
Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: david6666666 <david6666666@users.noreply.github.com>
Co-authored-by: david6666666 <david6666666@users.noreply.github.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
Co-authored-by: Samit <285365963@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
…2383)

Signed-off-by: lishunyang <lishunyang12@163.com>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: xiaohajiayou <75477391+xiaohajiayou@users.noreply.github.com>
Co-authored-by: Alex Brooks <albrooks@redhat.com>
Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
…+decode batches (vllm-project#2903)

Signed-off-by: Sy03 <1370724210@qq.com>
…memory (vllm-project#2474)

Signed-off-by: willamhou <willamhou@ceresman.com>
Co-authored-by: willamhou <willamhou@ceresman.com>
Signed-off-by: xiaohajiayou <923390377@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Co-authored-by: Samit <285365963@qq.com>
Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…m-project#2018)

Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com>
Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
Resolves:
- omni_generation_scheduler.py import conflict
- stage_configs/*.yaml migrated to vllm_omni/deploy/ (stop_token_ids
  and detokenize carried over from dev)

Signed-off-by: Sy03 <1370724210@qq.com>
PR vllm-project#2819 added Qwen2_5OmniForConditionalGeneration to
_OMNI_ARCHITECTURES but did not update the corresponding unit
test, causing test_omni_architectures_set_contains_expected to
fail on both simple-unit-test and modelrunner-v2-unit-test CI jobs.

Signed-off-by: Sy03 <1370724210@qq.com>
Talker models replace embed_tokens with codec_embedding whose dim
may differ from hf_text_config.hidden_size. The MTP static buffers
were allocated using self.inputs_embeds_size (= hf_text_config),
causing RuntimeError when .copy_() encounters a shape mismatch
(e.g. buffer=2048 vs actual embed dim=1024).

Probe the model's actual embedding dim via embed_input_ids() at
init time instead of relying on hf_text_config.hidden_size.

Signed-off-by: Sy03 <1370724210@qq.com>
The already_finished_reqs branch in OmniGenerationScheduler.schedule()
only removed requests from the running queue but never added them to
self.finished_req_ids. This meant the worker never received the
finished signal and never released the corresponding req_state slots,
triggering AssertionError: No free indices in req_states.add_request
when a subsequent new request tried to claim a slot.

Propagate finished ids to both self.finished_req_ids (single-client
path) and self.finished_req_ids_dict (multi-client path) to match the
upstream _free_request behavior.

Signed-off-by: Sy03 <1370724210@qq.com>
Signed-off-by: Sy03 <1370724210@qq.com>
@Sy0307 Sy0307 requested a review from hsliuustc0106 as a code owner April 19, 2026 18:29
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@Sy0307
Copy link
Copy Markdown
Contributor Author

Sy0307 commented Apr 19, 2026

@tzhouam PTAK and DCO error is due to merge so plz dismiss them.

@tzhouam tzhouam merged commit 80441ca into vllm-project:dev/migrate-MR-v2 Apr 20, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.