[Rebase] Rebase to vllm v0.19.0 by tzhouam · Pull Request #2475 · vllm-project/vllm-omni

tzhouam · 2026-04-03T15:50:58Z

Purpose

This PR aims to rebase the vllm version that this repo relies on to v0.19.0

Test Plan

Testing on release pipeline
https://buildkite.com/vllm/vllm-omni-rebase/builds/713/steps/canvas

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

- Update Dockerfile.ci to install vLLM from specific commit wheel, add flashinfer/cublas/numpy dependencies - Fix worker_type leak in omni_stage by using pop instead of get Made-with: Cursor Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…r and numpy, and create a symlink for python3. This enhances the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…nto a single RUN statement, improving readability and efficiency of the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup." This reverts commit 1c0a71e. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…yaml for improved performance Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

- Updated import for OpenAIServingEmbedding in api_server.py. - Enhanced omni_init_app_state to initialize renderer for engine_client. - Added handle_oov_mm_token parameter to multiple model classes for better multimodal token handling. - Improved comments for clarity in various model files. - Adjusted GPUARModelRunner to ensure proper handling of late interaction runner attributes. Signed-off-by: [Your Name] <[Your Email]> Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…sed during shutdown Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…ling during draft model runs - Changed variable name to better reflect its purpose in managing KV connector metadata. - Updated comments for improved clarity regarding the handling of speculative configurations. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

- Removed unnecessary blank lines in data.py to improve code readability. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

Copilot

Pull request overview

Rebases the vllm-omni integration to vLLM v0.19.0 by aligning runner logic and type/import surfaces with upstream API changes, particularly around inputs typing, KV transfer metadata, and speculative decoding bookkeeping.

Changes:

Update worker runners to match upstream v0.19.0 execution/state-update flow (including deferred async spec-decode corrections and KV connector metadata handling).
Migrate multiple model/input imports from vllm.inputs.data / vllm.multimodal.inputs to vllm.inputs.
Adjust NPU runner to use the new vllm.v1.worker.mamba_utils.preprocess_mamba import path.

Reviewed changes

Copilot reviewed 47 out of 48 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
vllm_omni/worker/gpu_model_runner.py	Updates persistent-batch state updates for v0.19.0, adds deferred async spec-decode correction hook, and adjusts tensor handling paths.
vllm_omni/worker/gpu_generation_model_runner.py	Adapts execute path to call deferred state-correction function and switches KV preemption handling to kv-connector metadata.
vllm_omni/worker/gpu_ar_model_runner.py	Aligns AR execution/sampling with upstream changes (state correction callback, speculative decoding arg changes).
vllm_omni/platforms/npu/worker/npu_ar_model_runner.py	Updates mamba preprocessing import/callsite for v0.19.0.
vllm_omni/patch.py	Updates TokensPrompt import path to new `vllm.inputs` surface.
vllm_omni/inputs/preprocess.py	Updates vLLM input type imports and return types to new `vllm.inputs` names.
vllm_omni/model_executor/models/voxtral_tts/voxtral_tts_audio_generation.py	Moves `MultiModalDataDict` import to `vllm.inputs`.
vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py	Migrates prompt-related imports to `vllm.inputs`.
vllm_omni/model_executor/models/qwen3_omni/qwen3_omni_moe_thinker.py	Migrates `PromptType` import to `vllm.inputs`.
vllm_omni/model_executor/models/mimo_audio/mimo_audio.py	Migrates multimodal typing imports to `vllm.inputs`.
vllm_omni/model_executor/models/mimo_audio/mimo_audio_llm.py	Migrates `MultiModalDataDict` import to `vllm.inputs`.
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py	Migrates `MultiModalDataDict` import to `vllm.inputs`.
vllm_omni/model_executor/models/glm_image/glm_image_ar.py	Migrates `MultiModalDataDict` import to `vllm.inputs`.
vllm_omni/model_executor/models/cosyvoice3/cosyvoice3.py	Migrates `MultiModalDataDict` import to `vllm.inputs`.
vllm_omni/model_executor/models/bagel/bagel.py	Migrates `MultiModalDataDict` import to `vllm.inputs`.
vllm_omni/benchmarks/patch/patch.py	Updates benchmark patching; introduces `pybase64` import.
tests/e2e/online_serving/test_mimo_audio.py	Adds module-level guard to avoid collection failures in restricted environments.
.buildkite/test-nightly.yml	Updates nightly pipeline image tag interpolation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T16:09:35Z

                mrope_pos_ptr += completion_part_len

-    def _update_states(self, scheduler_output: "SchedulerOutput") -> None:
+    def _update_states(self, scheduler_output: "SchedulerOutput"):


_update_states previously had an explicit -> None return type, but now returns an optional deferred-correction callback (see later returns). Please add an explicit return annotation (e.g., Callable[[], None] | None) so the method contract is clear and type checkers don’t infer Any.

Copilot · 2026-04-03T16:09:36Z

+                valid_sampled_token_count = self._get_valid_sampled_token_count()
+                if not valid_sampled_token_count:
+                    return


valid_sampled_token_count is indexed like an array later, and is likely a NumPy array / Torch tensor. Using if not valid_sampled_token_count: can raise an “ambiguous truth value” error for arrays/tensors. Prefer an explicit check such as is None (or len(...) == 0 if it’s a list) to avoid runtime failures during deferred spec-decode correction.

Copilot · 2026-04-03T16:09:36Z

 from typing import Literal

 import aiohttp
+import pybase64 as base64


This introduces a hard dependency on pybase64, but it doesn’t appear to be declared anywhere in the repo’s dependency manifests. Either add pybase64 to the appropriate dependencies/extras or fall back to stdlib base64 when pybase64 isn’t installed so benchmarks don’t crash at import time.

Suggested change

import pybase64 as base64

try:

import pybase64 as base64

except ImportError:

import base64

Copilot · 2026-04-03T16:09:36Z

+except Exception:
+    test_params = []


The broad except Exception + test_params = [] will silently skip all parametrized tests on any setup failure (e.g., transient network/auth issues), which can hide real regressions. Prefer pytest.skip(..., allow_module_level=True) (and/or catching specific expected exceptions) so skips are visible with a reason in CI output.

Suggested change

except Exception:

test_params = []

except Exception as exc:

pytest.skip(

f"Skipping MiMo-Audio online serving tests because module setup "

f"failed: {exc}",

allow_module_level=True,

)

Exactly. Please modify

Copilot · 2026-04-03T16:09:36Z

          podSpec:
            containers:
-              - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:c392ce21e9cf9ea65c52b866447793db10e0261c
+              - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}


This step uses ${BUILDKITE_COMMIT} while other steps in the same file use $BUILDKITE_COMMIT. Please make the syntax consistent to avoid surprising interpolation/escaping differences across shells/plugins.

Suggested change

- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}

- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT

Exactly. Please modify this.

Gaohan123 · 2026-04-04T00:04:26Z

    key: "build-wheel"
    agents:
-      queue: cpu_queue_release
+      queue: cpu_queue_premerge


Per merge seems over frequent?

This is choosing the cluster running the pipeline the requency is controled by the schedule inside setting.

Gaohan123 · 2026-04-04T00:05:51Z

          podSpec:
            containers:
-              - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:c392ce21e9cf9ea65c52b866447793db10e0261c
+              - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}


Exactly. Please modify this.

Gaohan123 · 2026-04-04T00:08:36Z

+except Exception:
+    test_params = []


Exactly. Please modify

Gaohan123 · 2026-04-04T00:11:11Z

+import pytest
+from transformers import PretrainedConfig
+
+


Is the change in this file related to rebase?

Gaohan123 · 2026-04-04T00:13:18Z

 from typing import Literal

 import aiohttp
+import pybase64 as base64


Is it necessary we depend it? Why not the original one?

Gaohan123 · 2026-04-04T00:15:41Z

        num_inference_steps = 1
-        height = 1024
-        width = 1024
+        height = 512


Why change this?

The tests are run in 512512 and 10241024 causes strange OOM error.

Gaohan123 · 2026-04-04T00:17:01Z

    for arch, (mod_folder, mod_relname, cls_name) in _OMNI_MODELS.items():
-        if arch not in supported_archs:
-            ModelRegistry.register_model(arch, f"vllm_omni.model_executor.models.{mod_folder}.{mod_relname}:{cls_name}")
+        ModelRegistry.register_model(arch, f"vllm_omni.model_executor.models.{mod_folder}.{mod_relname}:{cls_name}")


Is is related to rebase?

Gaohan123 · 2026-04-04T00:17:26Z

        sampling_params_list: Sequence[Any] | None = None,
        final_stage_id: int = 0,
        arrival_time: float | None = None,
+        lora_request: Any = None,


necessary extra params?

This is to align with the upstream AsyncLLM.add_request() 和 AsyncLLM.generate() in vllm.

Gaohan123 · 2026-04-04T00:17:35Z

                params=params,
                supported_tasks=self.supported_tasks,
                arrival_time=arrival_time,
+                lora_request=lora_request,


Gaohan123 · 2026-04-04T00:19:33Z

Remove this

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

…vllm-omni into dev/rebase-v0.19.0

…entation - Updated VLLM_VERSION in pipeline-intel.yaml and Dockerfiles for ROCm and XPU to 0.19.0. - Modified installation instructions in quickstart.md, cuda.inc.md, and rocm.inc.md to reflect the new version. - Adjusted pre-built wheel download links and git checkout commands to point to version 0.19.0. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

…vllm-omni into dev/rebase-v0.19.0

Gaohan123 · 2026-04-04T05:09:58Z

Please fix DCO. And it seems the pipeline modification leads to release CI failure.

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

….19.0 compat) In vLLM 0.19.0, AutoConfig.from_pretrained reads model_type directly from config.json before applying hf_overrides. For models with empty config.json (e.g. CosyVoice3), this causes "Unrecognized model" error. Fix: detect empty configs and create a temporary patched config.json with model_type injected, then set hf_config_path to point to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>

Gaohan123

LGTM. Thanks!

tzhouam added 30 commits March 10, 2026 01:46

Align dev/vllm-align with upstream vLLM changes

dd0c893

- Update Dockerfile.ci to install vLLM from specific commit wheel, add flashinfer/cublas/numpy dependencies - Fix worker_type leak in omni_stage by using pop instead of get Made-with: Cursor Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Update Dockerfile.ci to install vLLM from a new commit and simplify w…

3630bdf

…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

e150a1b

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

7706132

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

cfbdb57

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

8ac369c

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Update Dockerfile.ci to install additional dependencies for flashinfe…

cf0a7a5

…r and numpy, and create a symlink for python3. This enhances the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

9dc5d15

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

5c845ce

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

4014e76

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

dd0ec55

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

b0cf788

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

cee2b4b

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Refactor Dockerfile.ci to consolidate package installation commands i…

1c0a71e

…nto a single RUN statement, improving readability and efficiency of the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Revert "Refactor Dockerfile.ci to consolidate package installation co…

a9862f1

…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup." This reverts commit 1c0a71e. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

20c102d

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Merge remote-tracking branch 'origin/main' into dev/vllm-align

ad12921

Merge remote-tracking branch 'origin/main' into dev/vllm-align

91135f1

Merge remote-tracking branch 'origin/main' into dev/vllm-align

e47c5dd

rebase: align vllm-omni with vLLM 84e436ed1c94

08a2673

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

rebase: align vllm-omni with vLLM 84e436ed1c94

11b6c16

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Adjust GPU memory utilization and device settings in qwen2_5_omni_ci.…

72cb4f6

…yaml for improved performance Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Fix output queue handling in OmniStage to return None if queue is clo…

d911475

…sed during shutdown Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Merge branch 'main' into dev/vllm-align

d3db476

Merge remote-tracking branch 'origin/main' into dev/vllm-align

97899c6

Merge remote-tracking branch 'origin/main' into dev/vllm-align

f137a1e

rebase: align vllm-omni with vLLM e9163b536e72

3a508d7

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Refactor import statement for OpenAIServingEmbedding in api_server.py

aae101f

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

tzhouam requested review from Copilot, gcanlin, linyueqian, lishunyang12, yenuo26 and ywang96 April 3, 2026 15:57

Copilot started reviewing on behalf of tzhouam April 3, 2026 15:58 View session

style: clean up whitespace in data.py

101cb64

- Removed unnecessary blank lines in data.py to improve code readability. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

Copilot AI reviewed Apr 3, 2026

View reviewed changes

linyueqian mentioned this pull request Apr 3, 2026

[Qwen3TTS] [TTS] [Feat] Refactor voice cache manager #2108

Merged

5 tasks

Gaohan123 reviewed Apr 4, 2026

View reviewed changes

Gaohan123 and others added 7 commits April 4, 2026 08:20

Merge branch 'main' into dev/rebase-v0.19.0

db0eb0e

fix by review

f7e44cb

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

fix builddoc

24a2e0e

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

Merge branch 'dev/rebase-v0.19.0' of https://github.com/vllm-project/…

584c7d9

…vllm-omni into dev/rebase-v0.19.0

Merge branch 'main' into dev/rebase-v0.19.0

9117e50

Merge branch 'dev/rebase-v0.19.0' of https://github.com/vllm-project/…

e35c0b1

…vllm-omni into dev/rebase-v0.19.0

Gaohan123 added the ready label to trigger buildkite CI label Apr 4, 2026

tzhouam and others added 2 commits April 4, 2026 14:16

unchange the pipeline

e305b03

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>

Gaohan123 approved these changes Apr 4, 2026

View reviewed changes

Gaohan123 merged commit 2804a85 into main Apr 4, 2026
9 checks passed

linyueqian mentioned this pull request Apr 4, 2026

[CosyVoice3] Fix vLLM 0.19.0 compatibility issues #2486

Merged

3 tasks

skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026

[Rebase] Rebase to vllm v0.19.0 (vllm-project#2475)

e927499

vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026

[Rebase] Rebase to vllm v0.19.0 (vllm-project#2475)

c1cca7f

coderabbitai Bot mentioned this pull request Apr 27, 2026

build: bump vLLM to 0.20.0 ai-dynamo/dynamo#8762

Merged

lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026

[Rebase] Rebase to vllm v0.19.0 (vllm-project#2475)

fbec628

clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026

[Rebase] Rebase to vllm v0.19.0 (vllm-project#2475)

0f7b49b

-except Exception:
-    test_params = []
+except Exception as exc:
+    pytest.skip(
+        f"Skipping MiMo-Audio online serving tests because module setup "
+        f"failed: {exc}",
+        allow_module_level=True,
+    )

	- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}
	- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT

Conversation

tzhouam commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gaohan123 commented Apr 4, 2026

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

tzhouam commented Apr 3, 2026 •

edited

Loading