Skip to content

[Rebase] Rebase to vllm v0.19.0#2475

Merged
Gaohan123 merged 102 commits into
mainfrom
dev/rebase-v0.19.0
Apr 4, 2026
Merged

[Rebase] Rebase to vllm v0.19.0#2475
Gaohan123 merged 102 commits into
mainfrom
dev/rebase-v0.19.0

Conversation

@tzhouam
Copy link
Copy Markdown
Collaborator

@tzhouam tzhouam commented Apr 3, 2026

Purpose

This PR aims to rebase the vllm version that this repo relies on to v0.19.0

Test Plan

Testing on release pipeline
https://buildkite.com/vllm/vllm-omni-rebase/builds/713/steps/canvas

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

tzhouam added 30 commits March 10, 2026 01:46
- Update Dockerfile.ci to install vLLM from specific commit wheel,
  add flashinfer/cublas/numpy dependencies
- Fix worker_type leak in omni_stage by using pop instead of get

Made-with: Cursor

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…r and numpy, and create a symlink for python3. This enhances the CI environment setup.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…nto a single RUN statement, improving readability and efficiency of the CI environment setup.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup."

This reverts commit 1c0a71e.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…yaml for improved performance

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
- Updated import for OpenAIServingEmbedding in api_server.py.
- Enhanced omni_init_app_state to initialize renderer for engine_client.
- Added handle_oov_mm_token parameter to multiple model classes for better multimodal token handling.
- Improved comments for clarity in various model files.
- Adjusted GPUARModelRunner to ensure proper handling of late interaction runner attributes.

Signed-off-by: [Your Name] <[Your Email]>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…sed during shutdown

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…ling during draft model runs

- Changed variable name to better reflect its purpose in managing KV connector metadata.
- Updated comments for improved clarity regarding the handling of speculative configurations.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
- Removed unnecessary blank lines in data.py to improve code readability.

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Rebases the vllm-omni integration to vLLM v0.19.0 by aligning runner logic and type/import surfaces with upstream API changes, particularly around inputs typing, KV transfer metadata, and speculative decoding bookkeeping.

Changes:

  • Update worker runners to match upstream v0.19.0 execution/state-update flow (including deferred async spec-decode corrections and KV connector metadata handling).
  • Migrate multiple model/input imports from vllm.inputs.data / vllm.multimodal.inputs to vllm.inputs.
  • Adjust NPU runner to use the new vllm.v1.worker.mamba_utils.preprocess_mamba import path.

Reviewed changes

Copilot reviewed 47 out of 48 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
vllm_omni/worker/gpu_model_runner.py Updates persistent-batch state updates for v0.19.0, adds deferred async spec-decode correction hook, and adjusts tensor handling paths.
vllm_omni/worker/gpu_generation_model_runner.py Adapts execute path to call deferred state-correction function and switches KV preemption handling to kv-connector metadata.
vllm_omni/worker/gpu_ar_model_runner.py Aligns AR execution/sampling with upstream changes (state correction callback, speculative decoding arg changes).
vllm_omni/platforms/npu/worker/npu_ar_model_runner.py Updates mamba preprocessing import/callsite for v0.19.0.
vllm_omni/patch.py Updates TokensPrompt import path to new vllm.inputs surface.
vllm_omni/inputs/preprocess.py Updates vLLM input type imports and return types to new vllm.inputs names.
vllm_omni/model_executor/models/voxtral_tts/voxtral_tts_audio_generation.py Moves MultiModalDataDict import to vllm.inputs.
vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py Migrates prompt-related imports to vllm.inputs.
vllm_omni/model_executor/models/qwen3_omni/qwen3_omni_moe_thinker.py Migrates PromptType import to vllm.inputs.
vllm_omni/model_executor/models/mimo_audio/mimo_audio.py Migrates multimodal typing imports to vllm.inputs.
vllm_omni/model_executor/models/mimo_audio/mimo_audio_llm.py Migrates MultiModalDataDict import to vllm.inputs.
vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py Migrates MultiModalDataDict import to vllm.inputs.
vllm_omni/model_executor/models/glm_image/glm_image_ar.py Migrates MultiModalDataDict import to vllm.inputs.
vllm_omni/model_executor/models/cosyvoice3/cosyvoice3.py Migrates MultiModalDataDict import to vllm.inputs.
vllm_omni/model_executor/models/bagel/bagel.py Migrates MultiModalDataDict import to vllm.inputs.
vllm_omni/benchmarks/patch/patch.py Updates benchmark patching; introduces pybase64 import.
tests/e2e/online_serving/test_mimo_audio.py Adds module-level guard to avoid collection failures in restricted environments.
.buildkite/test-nightly.yml Updates nightly pipeline image tag interpolation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mrope_pos_ptr += completion_part_len

def _update_states(self, scheduler_output: "SchedulerOutput") -> None:
def _update_states(self, scheduler_output: "SchedulerOutput"):
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_update_states previously had an explicit -> None return type, but now returns an optional deferred-correction callback (see later returns). Please add an explicit return annotation (e.g., Callable[[], None] | None) so the method contract is clear and type checkers don’t infer Any.

Copilot uses AI. Check for mistakes.
Comment on lines +526 to +528
valid_sampled_token_count = self._get_valid_sampled_token_count()
if not valid_sampled_token_count:
return
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid_sampled_token_count is indexed like an array later, and is likely a NumPy array / Torch tensor. Using if not valid_sampled_token_count: can raise an “ambiguous truth value” error for arrays/tensors. Prefer an explicit check such as is None (or len(...) == 0 if it’s a list) to avoid runtime failures during deferred spec-decode correction.

Copilot uses AI. Check for mistakes.
Comment thread vllm_omni/benchmarks/patch/patch.py Outdated
from typing import Literal

import aiohttp
import pybase64 as base64
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a hard dependency on pybase64, but it doesn’t appear to be declared anywhere in the repo’s dependency manifests. Either add pybase64 to the appropriate dependencies/extras or fall back to stdlib base64 when pybase64 isn’t installed so benchmarks don’t crash at import time.

Suggested change
import pybase64 as base64
try:
import pybase64 as base64
except ImportError:
import base64

Copilot uses AI. Check for mistakes.
Comment on lines +82 to +83
except Exception:
test_params = []
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad except Exception + test_params = [] will silently skip all parametrized tests on any setup failure (e.g., transient network/auth issues), which can hide real regressions. Prefer pytest.skip(..., allow_module_level=True) (and/or catching specific expected exceptions) so skips are visible with a reason in CI output.

Suggested change
except Exception:
test_params = []
except Exception as exc:
pytest.skip(
f"Skipping MiMo-Audio online serving tests because module setup "
f"failed: {exc}",
allow_module_level=True,
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Please modify

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread .buildkite/test-nightly.yml Outdated
podSpec:
containers:
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:c392ce21e9cf9ea65c52b866447793db10e0261c
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step uses ${BUILDKITE_COMMIT} while other steps in the same file use $BUILDKITE_COMMIT. Please make the syntax consistent to avoid surprising interpolation/escaping differences across shells/plugins.

Suggested change
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Please modify this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

key: "build-wheel"
agents:
queue: cpu_queue_release
queue: cpu_queue_premerge
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per merge seems over frequent?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is choosing the cluster running the pipeline the requency is controled by the schedule inside setting.

Comment thread .buildkite/test-nightly.yml Outdated
podSpec:
containers:
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:c392ce21e9cf9ea65c52b866447793db10e0261c
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Please modify this.

Comment on lines +82 to +83
except Exception:
test_params = []
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Please modify

Comment thread tests/engine/conftest.py Outdated
import pytest
from transformers import PretrainedConfig


Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change in this file related to rebase?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread vllm_omni/benchmarks/patch/patch.py Outdated
from typing import Literal

import aiohttp
import pybase64 as base64
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary we depend it? Why not the original one?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

num_inference_steps = 1
height = 1024
width = 1024
height = 512
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are run in 512512 and 10241024 causes strange OOM error.

Comment thread vllm_omni/engine/arg_utils.py Outdated
for arch, (mod_folder, mod_relname, cls_name) in _OMNI_MODELS.items():
if arch not in supported_archs:
ModelRegistry.register_model(arch, f"vllm_omni.model_executor.models.{mod_folder}.{mod_relname}:{cls_name}")
ModelRegistry.register_model(arch, f"vllm_omni.model_executor.models.{mod_folder}.{mod_relname}:{cls_name}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is related to rebase?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

sampling_params_list: Sequence[Any] | None = None,
final_stage_id: int = 0,
arrival_time: float | None = None,
lora_request: Any = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary extra params?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to align with the upstream AsyncLLM.add_request() 和 AsyncLLM.generate() in vllm.

params=params,
supported_tasks=self.supported_tasks,
arrival_time=arrival_time,
lora_request=lora_request,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same

Comment thread image_output.png Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Gaohan123 and others added 7 commits April 4, 2026 08:20
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
…entation

- Updated VLLM_VERSION in pipeline-intel.yaml and Dockerfiles for ROCm and XPU to 0.19.0.
- Modified installation instructions in quickstart.md, cuda.inc.md, and rocm.inc.md to reflect the new version.
- Adjusted pre-built wheel download links and git checkout commands to point to version 0.19.0.

Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Apr 4, 2026
@Gaohan123
Copy link
Copy Markdown
Collaborator

Please fix DCO. And it seems the pipeline modification leads to release CI failure.

tzhouam and others added 2 commits April 4, 2026 14:16
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
….19.0 compat)

In vLLM 0.19.0, AutoConfig.from_pretrained reads model_type directly
from config.json before applying hf_overrides. For models with empty
config.json (e.g. CosyVoice3), this causes "Unrecognized model" error.

Fix: detect empty configs and create a temporary patched config.json
with model_type injected, then set hf_config_path to point to it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: linyueqian <linyueqian@outlook.com>
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@Gaohan123 Gaohan123 merged commit 2804a85 into main Apr 4, 2026
9 checks passed
skf-1999 pushed a commit to Semmer2/vllm-omni that referenced this pull request Apr 7, 2026
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants