[Rebase] Rebase to vllm v0.19.0#2475
Conversation
- Update Dockerfile.ci to install vLLM from specific commit wheel, add flashinfer/cublas/numpy dependencies - Fix worker_type leak in omni_stage by using pop instead of get Made-with: Cursor Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…r and numpy, and create a symlink for python3. This enhances the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…nto a single RUN statement, improving readability and efficiency of the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup." This reverts commit 1c0a71e. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…yaml for improved performance Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
- Updated import for OpenAIServingEmbedding in api_server.py. - Enhanced omni_init_app_state to initialize renderer for engine_client. - Added handle_oov_mm_token parameter to multiple model classes for better multimodal token handling. - Improved comments for clarity in various model files. - Adjusted GPUARModelRunner to ensure proper handling of late interaction runner attributes. Signed-off-by: [Your Name] <[Your Email]> Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…sed during shutdown Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…ling during draft model runs - Changed variable name to better reflect its purpose in managing KV connector metadata. - Updated comments for improved clarity regarding the handling of speculative configurations. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
- Removed unnecessary blank lines in data.py to improve code readability. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
There was a problem hiding this comment.
Pull request overview
Rebases the vllm-omni integration to vLLM v0.19.0 by aligning runner logic and type/import surfaces with upstream API changes, particularly around inputs typing, KV transfer metadata, and speculative decoding bookkeeping.
Changes:
- Update worker runners to match upstream v0.19.0 execution/state-update flow (including deferred async spec-decode corrections and KV connector metadata handling).
- Migrate multiple model/input imports from
vllm.inputs.data/vllm.multimodal.inputstovllm.inputs. - Adjust NPU runner to use the new
vllm.v1.worker.mamba_utils.preprocess_mambaimport path.
Reviewed changes
Copilot reviewed 47 out of 48 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/worker/gpu_model_runner.py | Updates persistent-batch state updates for v0.19.0, adds deferred async spec-decode correction hook, and adjusts tensor handling paths. |
| vllm_omni/worker/gpu_generation_model_runner.py | Adapts execute path to call deferred state-correction function and switches KV preemption handling to kv-connector metadata. |
| vllm_omni/worker/gpu_ar_model_runner.py | Aligns AR execution/sampling with upstream changes (state correction callback, speculative decoding arg changes). |
| vllm_omni/platforms/npu/worker/npu_ar_model_runner.py | Updates mamba preprocessing import/callsite for v0.19.0. |
| vllm_omni/patch.py | Updates TokensPrompt import path to new vllm.inputs surface. |
| vllm_omni/inputs/preprocess.py | Updates vLLM input type imports and return types to new vllm.inputs names. |
| vllm_omni/model_executor/models/voxtral_tts/voxtral_tts_audio_generation.py | Moves MultiModalDataDict import to vllm.inputs. |
| vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py | Migrates prompt-related imports to vllm.inputs. |
| vllm_omni/model_executor/models/qwen3_omni/qwen3_omni_moe_thinker.py | Migrates PromptType import to vllm.inputs. |
| vllm_omni/model_executor/models/mimo_audio/mimo_audio.py | Migrates multimodal typing imports to vllm.inputs. |
| vllm_omni/model_executor/models/mimo_audio/mimo_audio_llm.py | Migrates MultiModalDataDict import to vllm.inputs. |
| vllm_omni/model_executor/models/hunyuan_image3/hunyuan_image3.py | Migrates MultiModalDataDict import to vllm.inputs. |
| vllm_omni/model_executor/models/glm_image/glm_image_ar.py | Migrates MultiModalDataDict import to vllm.inputs. |
| vllm_omni/model_executor/models/cosyvoice3/cosyvoice3.py | Migrates MultiModalDataDict import to vllm.inputs. |
| vllm_omni/model_executor/models/bagel/bagel.py | Migrates MultiModalDataDict import to vllm.inputs. |
| vllm_omni/benchmarks/patch/patch.py | Updates benchmark patching; introduces pybase64 import. |
| tests/e2e/online_serving/test_mimo_audio.py | Adds module-level guard to avoid collection failures in restricted environments. |
| .buildkite/test-nightly.yml | Updates nightly pipeline image tag interpolation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mrope_pos_ptr += completion_part_len | ||
|
|
||
| def _update_states(self, scheduler_output: "SchedulerOutput") -> None: | ||
| def _update_states(self, scheduler_output: "SchedulerOutput"): |
There was a problem hiding this comment.
_update_states previously had an explicit -> None return type, but now returns an optional deferred-correction callback (see later returns). Please add an explicit return annotation (e.g., Callable[[], None] | None) so the method contract is clear and type checkers don’t infer Any.
| valid_sampled_token_count = self._get_valid_sampled_token_count() | ||
| if not valid_sampled_token_count: | ||
| return |
There was a problem hiding this comment.
valid_sampled_token_count is indexed like an array later, and is likely a NumPy array / Torch tensor. Using if not valid_sampled_token_count: can raise an “ambiguous truth value” error for arrays/tensors. Prefer an explicit check such as is None (or len(...) == 0 if it’s a list) to avoid runtime failures during deferred spec-decode correction.
| from typing import Literal | ||
|
|
||
| import aiohttp | ||
| import pybase64 as base64 |
There was a problem hiding this comment.
This introduces a hard dependency on pybase64, but it doesn’t appear to be declared anywhere in the repo’s dependency manifests. Either add pybase64 to the appropriate dependencies/extras or fall back to stdlib base64 when pybase64 isn’t installed so benchmarks don’t crash at import time.
| import pybase64 as base64 | |
| try: | |
| import pybase64 as base64 | |
| except ImportError: | |
| import base64 |
| except Exception: | ||
| test_params = [] |
There was a problem hiding this comment.
The broad except Exception + test_params = [] will silently skip all parametrized tests on any setup failure (e.g., transient network/auth issues), which can hide real regressions. Prefer pytest.skip(..., allow_module_level=True) (and/or catching specific expected exceptions) so skips are visible with a reason in CI output.
| except Exception: | |
| test_params = [] | |
| except Exception as exc: | |
| pytest.skip( | |
| f"Skipping MiMo-Audio online serving tests because module setup " | |
| f"failed: {exc}", | |
| allow_module_level=True, | |
| ) |
| podSpec: | ||
| containers: | ||
| - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:c392ce21e9cf9ea65c52b866447793db10e0261c | ||
| - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT} |
There was a problem hiding this comment.
This step uses ${BUILDKITE_COMMIT} while other steps in the same file use $BUILDKITE_COMMIT. Please make the syntax consistent to avoid surprising interpolation/escaping differences across shells/plugins.
| - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT} | |
| - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT |
There was a problem hiding this comment.
Exactly. Please modify this.
| key: "build-wheel" | ||
| agents: | ||
| queue: cpu_queue_release | ||
| queue: cpu_queue_premerge |
There was a problem hiding this comment.
Per merge seems over frequent?
There was a problem hiding this comment.
This is choosing the cluster running the pipeline the requency is controled by the schedule inside setting.
| podSpec: | ||
| containers: | ||
| - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:c392ce21e9cf9ea65c52b866447793db10e0261c | ||
| - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT} |
There was a problem hiding this comment.
Exactly. Please modify this.
| except Exception: | ||
| test_params = [] |
| import pytest | ||
| from transformers import PretrainedConfig | ||
|
|
||
|
|
There was a problem hiding this comment.
Is the change in this file related to rebase?
| from typing import Literal | ||
|
|
||
| import aiohttp | ||
| import pybase64 as base64 |
There was a problem hiding this comment.
Is it necessary we depend it? Why not the original one?
| num_inference_steps = 1 | ||
| height = 1024 | ||
| width = 1024 | ||
| height = 512 |
There was a problem hiding this comment.
The tests are run in 512512 and 10241024 causes strange OOM error.
| for arch, (mod_folder, mod_relname, cls_name) in _OMNI_MODELS.items(): | ||
| if arch not in supported_archs: | ||
| ModelRegistry.register_model(arch, f"vllm_omni.model_executor.models.{mod_folder}.{mod_relname}:{cls_name}") | ||
| ModelRegistry.register_model(arch, f"vllm_omni.model_executor.models.{mod_folder}.{mod_relname}:{cls_name}") |
There was a problem hiding this comment.
Is is related to rebase?
| sampling_params_list: Sequence[Any] | None = None, | ||
| final_stage_id: int = 0, | ||
| arrival_time: float | None = None, | ||
| lora_request: Any = None, |
There was a problem hiding this comment.
This is to align with the upstream AsyncLLM.add_request() 和 AsyncLLM.generate() in vllm.
| params=params, | ||
| supported_tasks=self.supported_tasks, | ||
| arrival_time=arrival_time, | ||
| lora_request=lora_request, |
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
…vllm-omni into dev/rebase-v0.19.0
…entation - Updated VLLM_VERSION in pipeline-intel.yaml and Dockerfiles for ROCm and XPU to 0.19.0. - Modified installation instructions in quickstart.md, cuda.inc.md, and rocm.inc.md to reflect the new version. - Adjusted pre-built wheel download links and git checkout commands to point to version 0.19.0. Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
…vllm-omni into dev/rebase-v0.19.0
|
Please fix DCO. And it seems the pipeline modification leads to release CI failure. |
Signed-off-by: Taichang Zhou <tzhouam@connect.ust.hk>
….19.0 compat) In vLLM 0.19.0, AutoConfig.from_pretrained reads model_type directly from config.json before applying hf_overrides. For models with empty config.json (e.g. CosyVoice3), this causes "Unrecognized model" error. Fix: detect empty configs and create a temporary patched config.json with model_type injected, then set hf_config_path to point to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: linyueqian <linyueqian@outlook.com>
Purpose
This PR aims to rebase the vllm version that this repo relies on to v0.19.0
Test Plan
Testing on release pipeline
https://buildkite.com/vllm/vllm-omni-rebase/builds/713/steps/canvas
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)