Skip to content

[Rebase] Rebase to vllm v0.18.0#2037

Merged
Gaohan123 merged 67 commits intomainfrom
dev/rebase-v0.18.0
Mar 21, 2026
Merged

[Rebase] Rebase to vllm v0.18.0#2037
Gaohan123 merged 67 commits intomainfrom
dev/rebase-v0.18.0

Conversation

@tzhouam
Copy link
Copy Markdown
Collaborator

@tzhouam tzhouam commented Mar 20, 2026

Purpose

This AR aims to rebase the current vllm-omni to align with the vllm v0.18.0

Test Plan

Tested on Nightly Tests

Test Result

All tests passed except the Qwen Image Edit(#2036 ) and CUDA UT errors brought by the main branch.

Rebase Tests

image

Main branch Nightly Test on the Same Commit

image
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

tzhouam added 30 commits March 10, 2026 01:46
- Update Dockerfile.ci to install vLLM from specific commit wheel,
  add flashinfer/cublas/numpy dependencies
- Fix worker_type leak in omni_stage by using pop instead of get

Made-with: Cursor

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…r and numpy, and create a symlink for python3. This enhances the CI environment setup.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…nto a single RUN statement, improving readability and efficiency of the CI environment setup.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup."

This reverts commit 1c0a71e.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…yaml for improved performance

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
- Updated import for OpenAIServingEmbedding in api_server.py.
- Enhanced omni_init_app_state to initialize renderer for engine_client.
- Added handle_oov_mm_token parameter to multiple model classes for better multimodal token handling.
- Improved comments for clarity in various model files.
- Adjusted GPUARModelRunner to ensure proper handling of late interaction runner attributes.

Signed-off-by: [Your Name] <[Your Email]>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…sed during shutdown

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
…ling during draft model runs

- Changed variable name to better reflect its purpose in managing KV connector metadata.
- Updated comments for improved clarity regarding the handling of speculative configurations.

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 20, 2026

⚠️ The sha of the head commit of this PR conflicts with #1003. Mergify cannot evaluate rules on this PR. Once #1003 is merged or closed, Mergify will resume processing this PR. ⚠️

@tzhouam
Copy link
Copy Markdown
Collaborator Author

tzhouam commented Mar 20, 2026

@tjtanaa @xuechendi Please help me to review the amd/intel related part in PR #2038

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

io_processor_plugin = vllm_config.model_config.io_processor_plugin
self.io_processor = get_io_processor(vllm_config, io_processor_plugin)

P1 Badge Pass renderer into get_io_processor in AsyncOmni init

vLLM 0.18 makes IO processors renderer-aware; this same commit already updates api_server.py:573-580 to call get_io_processor(vllm_config, renderer, io_processor_plugin), but AsyncOmni.__init__ still uses the old 2-argument form. Any AsyncOmni startup for a model that sets io_processor_plugin will now raise before serving/offline generation begins, so those entrypoints stay broken after the rebase.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 133 to 136
prompt: OmniPromptType,
request_id: str,
sampling_params_list: Sequence[OmniSamplingParams] | None = None,
sampling_params: Any = None,
request_id: str = "",
*,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve AsyncOmni.generate positional argument order

This reorders the second positional parameter from request_id to sampling_params without a compatibility shim, but the repo still has callers using the old order, e.g. examples/offline_inference/voxtral_tts/end2end.py:72. In that path the request id is interpreted as sampling params, sampling_params_list is ignored, and generation runs under the empty default request_id, so the example no longer works as written.

Useful? React with 👍 / 👎.

Comment on lines +522 to +528
if use_audio_in_video and "audio" in mm_prompt_updates:
filtered_updates = {k: v for k, v in mm_prompt_updates.items() if k != "audio"}
prompt_ids, mm_placeholders = self._apply_prompt_updates(
prompt_ids,
filtered_updates,
)
# Derive audio placeholders from video placeholders
mm_placeholders = self._derive_audio_from_video_placeholders(mm_placeholders, mm_item_counts)
mm_placeholders = self._derive_audio_from_video_placeholders(mm_placeholders, mm_prompt_updates)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive audio placeholders for cached audio-in-video requests

In use_audio_in_video mode the audio placeholders still have to be synthesized from the video placeholders even when mm_prompt_updates has no 'audio' entry; the detection logic just above already treats that key as optional for cached items. By gating _derive_audio_from_video_placeholders(...) on 'audio' in mm_prompt_updates, cached Qwen3 requests fall into the else branch, keep only video placeholders, and then fail _validate_mm_placeholders because mm_item_counts still includes audio items.

Useful? React with 👍 / 👎.

Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>
Comment thread .buildkite/test-merge.yml

- label: "Diffusion Model Test"
timeout_in_minutes: 20
timeout_in_minutes: 30
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we modify these settings?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The running time is a little bit longer than 20 mins, I guess this is caused by much larger docker image than the main branch.

Comment thread .buildkite/test-ready.yml
- label: "Diffusion Model Test"
depends_on: upload-ready-pipeline
commands:
- timeout 20m pytest -s -v tests/e2e/offline_inference/test_t2i_model.py -m "core_model and diffusion" --run-level "core_model"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same timeout problem.

Comment thread vllm_omni/entrypoints/async_omni.py Outdated
trace_headers: Any = None,
priority: int = 0,
data_parallel_rank: int | None = None,
reasoning_ended: bool | None = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these extra input params?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

)
self.device = torch.device(f"cuda:{self.local_rank}")
current_platform.set_device(self.device)
torch.accelerator.set_device_index(self.device)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it hardware agnostic?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align with the upstream changes
image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will limit torch.accelerator on GPU specific code for now. Currently, NPU hasn't fully supported it. Change here is safe.

@Gaohan123 Gaohan123 added the ready label to trigger buildkite CI label Mar 20, 2026
…in async_omni.py

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>
@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Mar 20, 2026

@Gaohan123 @tzhouam I'm upgrading NPU part as well. Just a little change. Could I push it in this PR?

@princepride
Copy link
Copy Markdown
Collaborator

@tzhouam #2047 I revert some patch, PTAL.

Comment thread docker/Dockerfile.ci

RUN uv pip install --system --upgrade \
"flashinfer-cubin==0.6.6" \
"nvidia-cublas-cu12==12.9.1.4" \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the cublas version upgrade?

@Gaohan123
Copy link
Copy Markdown
Collaborator

This PR has passed all tests except only one test interrupted by system due to long time of pulling image. Ready to merge
https://buildkite.com/vllm/vllm-omni/builds/4651/steps/canvas

@Gaohan123 Gaohan123 merged commit a90a769 into main Mar 21, 2026
6 of 8 checks passed
hsliuustc0106 added a commit to hsliuustc0106/vllm-omni-skills that referenced this pull request Mar 22, 2026
### vllm-omni-audio-tts
- Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool
- Changes:
  - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool

### vllm-omni-perf
- Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool
- Changes:
  - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool

### vllm-omni-api
- Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection
- Changes:
  - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection

### vllm-omni-contrib
- Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example

### vllm-omni-cicd
- Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example

### vllm-omni-api
- Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model
- Changes:
  - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model

### vllm-omni-perf
- Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model
- Changes:
  - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model

### vllm-omni-contrib
- Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0

### vllm-omni-serving
- Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0

### vllm-omni-contrib
- Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0

### vllm-omni-api
- Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0

### vllm-omni-cicd
- Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0

### vllm-omni-cicd
- Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0`

### vllm-omni-cicd
- Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test.
- Changes:
  - Bug fix: [CI] Fix test.

### vllm-omni-cicd
- Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml`

### vllm-omni-cicd
- Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests

### vllm-omni-perf
- Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests

### vllm-omni-serving
- Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips
- Changes:
  - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips

### vllm-omni-image-gen
- Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips
- Changes:
  - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips

### vllm-omni-perf
- Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips
- Changes:
  - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips

### vllm-omni-serving
- Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni
- Changes:
  - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni

### vllm-omni-image-gen
- Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images
- Changes:
  - Bug fix: [Bugfix]Fix bug of online server can not return mutli images
- Additions:
  - Qwen-Image-Layered
  - Qwen-Image-Layered
  - Qwen-Image-Layered

### vllm-omni-api
- Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images
- Changes:
  - Bug fix: [Bugfix]Fix bug of online server can not return mutli images

### vllm-omni-cicd
- Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3)

### vllm-omni-serving
- Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls
- Changes:
  - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls

### vllm-omni-audio-tts
- Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls
- Changes:
  - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls

### vllm-omni-perf
- Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls
- Changes:
  - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls

### vllm-omni-serving
- Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue
- Changes:
  - Bug fix: [CI] [ROCm] Bugfix device environment issue

### vllm-omni-api
- Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue
- Changes:
  - Bug fix: [CI] [ROCm] Bugfix device environment issue

### vllm-omni-serving
- Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__
- Changes:
  - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__

### vllm-omni-cicd
- Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__
- Changes:
  - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__

### vllm-omni-api
- Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series)
- Changes:
  - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series)
- Additions:
  - `/v1/chat/completions`

### vllm-omni-perf
- Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series)
- Changes:
  - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series)

### vllm-omni-contrib
- Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case)
- Changes:
  - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case)

### vllm-omni-contrib
- Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support

### vllm-omni-cicd
- Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash
- Changes:
  - Bug fix: Fix Base voice clone streaming quality and stop-token crash

### vllm-omni-cicd
- Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models
- Changes:
  - New feature: [Test] L4 complete diffusion feature test for Bagel models

### vllm-omni-perf
- Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models
- Changes:
  - New feature: [Test] L4 complete diffusion feature test for Bagel models

### vllm-omni-perf
- Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models
- Changes:
  - Bug fix: Fix OmniGen2 transformer config loading for HF models

### vllm-omni-audio-tts
- Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request

### vllm-omni-perf
- Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request

### vllm-omni-audio-tts
- Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated

### vllm-omni-contrib
- Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models
- Changes:
  - New feature: [Docs] Add Wan2.1-T2V as supported video generation models

### vllm-omni-video-gen
- Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device
- Changes:
  - Bug fix: [Bugfix] fix helios video generate use cpu device

### vllm-omni-perf
- Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device
- Changes:
  - Bug fix: [Bugfix] fix helios video generate use cpu device

### vllm-omni-audio-tts
- Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False

### vllm-omni-perf
- Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False

### vllm-omni-api
- Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring

### vllm-omni-perf
- Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring

### vllm-omni-contrib
- Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring

### vllm-omni-serving
- Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring

### vllm-omni-cicd
- Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring

### vllm-omni-image-gen
- Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family
- Changes:
  - New feature: [Feat] support HSDP for Flux family

### vllm-omni-contrib
- Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family
- Changes:
  - New feature: [Feat] support HSDP for Flux family

### vllm-omni-distributed
- Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml
- Changes:
  - New feature: [Feature]: Remove some useless `hf_overrides` in yaml

### vllm-omni-quantization
- Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml
- Changes:
  - New feature: [Feature]: Remove some useless `hf_overrides` in yaml

### vllm-omni-cicd
- Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml
- Changes:
  - New feature: [Feature]: Remove some useless `hf_overrides` in yaml

### vllm-omni-perf
- Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml
- Changes:
  - New feature: [Feature]: Remove some useless `hf_overrides` in yaml

### vllm-omni-contrib
- Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0

### vllm-omni-contrib
- Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section
- Changes:
  - New feature: Add `Governance` section

### vllm-omni-distributed
- Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism
- Changes:
  - New feature: [Feat] Support T5 Tensor Parallelism

### vllm-omni-cicd
- Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism
- Changes:
  - New feature: [Feat] Support T5 Tensor Parallelism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants