[Bugfix] Sync main into dev/migrate-MR-v2 and fix build errors#2923
Merged
tzhouam merged 26 commits intovllm-project:dev/migrate-MR-v2from Apr 20, 2026
Merged
Conversation
…_generates_video[wan22_i2v_usp2_hsdp2] (vllm-project#2883) Signed-off-by: wangyu <410167048@qq.com>
Signed-off-by: Lancer <maruixiang6688@gmail.com>
…t#2343) Signed-off-by: Nick Cao <ncao@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
…ures (vllm-project#1837) Signed-off-by: CHEN <116010019@link.cuhk.edu.cn> Signed-off-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Signed-off-by: linyueqian <linyueqian@outlook.com> Co-authored-by: Yueqian Lin <70319226+linyueqian@users.noreply.github.com> Co-authored-by: linyueqian <linyueqian@outlook.com>
Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com> Signed-off-by: Joshna-Medisetty <joshna.medisetty@intel.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Signed-off-by: hsliuustc0106 <liuhongsheng4@huawei.com> Signed-off-by: hsliu <liuhongsheng4@huawei.com> Signed-off-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: david6666666 <david6666666@users.noreply.github.com> Co-authored-by: david6666666 <david6666666@users.noreply.github.com>
Signed-off-by: Nick Cao <ncao@redhat.com>
…t#2581) Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
Signed-off-by: CHEN <116010019@link.cuhk.edu.cn>
Signed-off-by: Lancer <maruixiang6688@gmail.com> Co-authored-by: Samit <285365963@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
…pt (vllm-project#2894) Signed-off-by: Sy03 <1370724210@qq.com>
…2383) Signed-off-by: lishunyang <lishunyang12@163.com> Signed-off-by: reidliu41 <reid201711@gmail.com> Signed-off-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Co-authored-by: xiaohajiayou <75477391+xiaohajiayou@users.noreply.github.com> Co-authored-by: Alex Brooks <albrooks@redhat.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
…+decode batches (vllm-project#2903) Signed-off-by: Sy03 <1370724210@qq.com>
…memory (vllm-project#2474) Signed-off-by: willamhou <willamhou@ceresman.com> Co-authored-by: willamhou <willamhou@ceresman.com>
Signed-off-by: xiaohajiayou <923390377@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: SYLAR <125541396+lishunyang12@users.noreply.github.com>
…m-project#2018) Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com> Signed-off-by: yuanheng <jonathan.zhaoyh@gmail.com> Co-authored-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: lishunyang <lishunyang12@163.com>
Resolves: - omni_generation_scheduler.py import conflict - stage_configs/*.yaml migrated to vllm_omni/deploy/ (stop_token_ids and detokenize carried over from dev) Signed-off-by: Sy03 <1370724210@qq.com>
PR vllm-project#2819 added Qwen2_5OmniForConditionalGeneration to _OMNI_ARCHITECTURES but did not update the corresponding unit test, causing test_omni_architectures_set_contains_expected to fail on both simple-unit-test and modelrunner-v2-unit-test CI jobs. Signed-off-by: Sy03 <1370724210@qq.com>
Talker models replace embed_tokens with codec_embedding whose dim may differ from hf_text_config.hidden_size. The MTP static buffers were allocated using self.inputs_embeds_size (= hf_text_config), causing RuntimeError when .copy_() encounters a shape mismatch (e.g. buffer=2048 vs actual embed dim=1024). Probe the model's actual embedding dim via embed_input_ids() at init time instead of relying on hf_text_config.hidden_size. Signed-off-by: Sy03 <1370724210@qq.com>
The already_finished_reqs branch in OmniGenerationScheduler.schedule() only removed requests from the running queue but never added them to self.finished_req_ids. This meant the worker never received the finished signal and never released the corresponding req_state slots, triggering AssertionError: No free indices in req_states.add_request when a subsequent new request tried to claim a slot. Propagate finished ids to both self.finished_req_ids (single-client path) and self.finished_req_ids_dict (multi-client path) to match the upstream _free_request behavior. Signed-off-by: Sy03 <1370724210@qq.com>
Signed-off-by: Sy03 <1370724210@qq.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Contributor
Author
|
@tzhouam PTAK and DCO error is due to merge so plz dismiss them. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Build #7230 on
dev/migrate-MR-v2surfaced three regressions, and the branch had also drifted behindmainby 17 commits. This PR does both: syncmaininto the branch and fix the three failures.Changes
1. Merge
origin/mainintodev/migrate-MR-v2Resolves:
vllm_omni/core/sched/omni_generation_scheduler.py(HEAD addedimport os, main addedfrom __future__ import annotations— keep both).vllm_omni/model_executor/stage_configs/*.yamlfiles. Main migrated them tovllm_omni/deploy/*.yamlunder PR [Config Refactor][2/N] Pipeline + Deploy Config Schema #2383's schema refactor. Dev's only change to those files was addingstop_token_ids/detokenizetodefault_sampling_params; those values are carried over into the newdeploy/qwen3_tts.yamlanddeploy/qwen3_omni_moe.yaml.This pull in also resolves CI failure #5 (
simple-unit-test: stage_configs property has no setter) — PR #2884 on main already fixed theFakeAsyncOmniClassfixture.2.
[BugFix] Add Qwen2_5Omni to test_init_model_state expected setFixes CI failure #3 (
modelrunner-v2-unit-test: test_omni_architectures_set_contains_expected). The expected set hardcoded in the test was out of sync with_OMNI_ARCHITECTURES.3.
[BugFix] Fix MTP buffer size mismatch for Omni Talker modelsFixes CI failure #2 (
full-moon-omni-star-doc-test-with-h100: size of tensor a (2048) must match size of tensor b (1024)).OmniModelStateallocated its MTP and staticinputs_embedsbuffers withself.inputs_embeds_size(=hf_text_config.hidden_size= 2048 for Qwen3-Omni Thinker). But Talker stages replaceembed_tokenswithcodec_embeddingwhose output dim is 1024. Probe the real dim once at init viamodel.embed_input_ids(dummy)and use that for buffer allocation.4.
[BugFix] Propagate finished_req_ids for already_finished_reqsFixes CI failure #4 (
qwen3-tts-base-e2e-test-modelrunner-v2: Orchestrator thread crashed / No free indices).The
already_finished_reqsbranch inOmniGenerationScheduler.schedule()only removed requests from the running queue but never added them toself.finished_req_ids. So the worker never got the finished signal, never released the correspondingreq_stateslots, and the next new request hitAssertionError: No free indicesinreq_states.add_request. Propagate to bothself.finished_req_idsandself.finished_req_ids_dictto match upstream_free_requestbehavior.Test Plan
tests/worker_v2/test_init_model_state.py— 5/5 passedtests/core/sched/test_generation_scheduler_{finish_condition,restore}.py— 11/11 passedtests/worker_v2/ tests/core/sched/full — 101/101 passed/v1/audio/speechrequests — all OKCI Failure Mapping
full-moon-diffusion-x2v-star-accuracy-testfull-moon-omni-star-doc-test-with-h100modelrunner-v2-unit-testqwen3-tts-base-e2e-test-modelrunner-v2simple-unit-test@propertysetter