[Rebase] Rebase to vllm v0.18.0 by tzhouam · Pull Request #2037 · vllm-project/vllm-omni

tzhouam · 2026-03-20T06:23:28Z

Purpose

This AR aims to rebase the current vllm-omni to align with the vllm v0.18.0

Test Plan

Tested on Nightly Tests

Test Result

All tests passed except the Qwen Image Edit(#2036 ) and CUDA UT errors brought by the main branch.

Rebase Tests

Main branch Nightly Test on the Same Commit

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

- Update Dockerfile.ci to install vLLM from specific commit wheel, add flashinfer/cublas/numpy dependencies - Fix worker_type leak in omni_stage by using pop instead of get Made-with: Cursor Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…r and numpy, and create a symlink for python3. This enhances the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…nto a single RUN statement, improving readability and efficiency of the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup." This reverts commit 1c0a71e. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…yaml for improved performance Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

- Updated import for OpenAIServingEmbedding in api_server.py. - Enhanced omni_init_app_state to initialize renderer for engine_client. - Added handle_oov_mm_token parameter to multiple model classes for better multimodal token handling. - Improved comments for clarity in various model files. - Adjusted GPUARModelRunner to ensure proper handling of late interaction runner attributes. Signed-off-by: [Your Name] <[Your Email]> Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…sed during shutdown Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

…ling during draft model runs - Changed variable name to better reflect its purpose in managing KV connector metadata. - Updated comments for improved clarity regarding the handling of speculative configurations. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

mergify · 2026-03-20T06:24:08Z

⚠️ The sha of the head commit of this PR conflicts with #1003. Mergify cannot evaluate rules on this PR. Once #1003 is merged or closed, Mergify will resume processing this PR. ⚠️

tzhouam · 2026-03-20T06:32:06Z

@tjtanaa @xuechendi Please help me to review the amd/intel related part in PR #2038

chatgpt-codex-connector

💡 Codex Review

vllm-omni/vllm_omni/entrypoints/async_omni.py

Lines 87 to 88 in 6ff31eb

    
           io_processor_plugin = vllm_config.model_config.io_processor_plugin 
        
           self.io_processor = get_io_processor(vllm_config, io_processor_plugin)

Pass renderer into get_io_processor in AsyncOmni init

vLLM 0.18 makes IO processors renderer-aware; this same commit already updates api_server.py:573-580 to call get_io_processor(vllm_config, renderer, io_processor_plugin), but AsyncOmni.__init__ still uses the old 2-argument form. Any AsyncOmni startup for a model that sets io_processor_plugin will now raise before serving/offline generation begins, so those entrypoints stay broken after the rebase.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T06:33:13Z

        prompt: OmniPromptType,
-        request_id: str,
-        sampling_params_list: Sequence[OmniSamplingParams] | None = None,
+        sampling_params: Any = None,
+        request_id: str = "",
        *,


Preserve AsyncOmni.generate positional argument order

This reorders the second positional parameter from request_id to sampling_params without a compatibility shim, but the repo still has callers using the old order, e.g. examples/offline_inference/voxtral_tts/end2end.py:72. In that path the request id is interpreted as sampling params, sampling_params_list is ignored, and generation runs under the empty default request_id, so the example no longer works as written.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-20T06:33:13Z

+            if use_audio_in_video and "audio" in mm_prompt_updates:
                filtered_updates = {k: v for k, v in mm_prompt_updates.items() if k != "audio"}
                prompt_ids, mm_placeholders = self._apply_prompt_updates(
                    prompt_ids,
                    filtered_updates,
                )
-                # Derive audio placeholders from video placeholders
-                mm_placeholders = self._derive_audio_from_video_placeholders(mm_placeholders, mm_item_counts)
+                mm_placeholders = self._derive_audio_from_video_placeholders(mm_placeholders, mm_prompt_updates)


Derive audio placeholders for cached audio-in-video requests

In use_audio_in_video mode the audio placeholders still have to be synthesized from the video placeholders even when mm_prompt_updates has no 'audio' entry; the detection logic just above already treats that key as optional for cached items. By gating _derive_audio_from_video_placeholders(...) on 'audio' in mm_prompt_updates, cached Qwen3 requests fall into the else branch, keep only video placeholders, and then fail _validate_mm_placeholders because mm_item_counts still includes audio items.

Useful? React with 👍 / 👎.

Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>

Gaohan123 · 2026-03-20T08:48:39Z


  - label: "Diffusion Model Test"
-    timeout_in_minutes: 20
+    timeout_in_minutes: 30


Why we modify these settings?

The running time is a little bit longer than 20 mins, I guess this is caused by much larger docker image than the main branch.

Gaohan123 · 2026-03-20T08:48:51Z

  - label: "Diffusion Model Test"
    depends_on: upload-ready-pipeline
    commands:
-      - timeout 20m pytest -s -v tests/e2e/offline_inference/test_t2i_model.py -m "core_model and diffusion" --run-level "core_model"


The same question

Same timeout problem.

Gaohan123 · 2026-03-20T08:53:09Z

+        trace_headers: Any = None,
+        priority: int = 0,
+        data_parallel_rank: int | None = None,
+        reasoning_ended: bool | None = None,


Do we need these extra input params?

Gaohan123 · 2026-03-20T08:54:57Z

                )
            self.device = torch.device(f"cuda:{self.local_rank}")
-            current_platform.set_device(self.device)
+            torch.accelerator.set_device_index(self.device)


Is it hardware agnostic?

align with the upstream changes

We will limit torch.accelerator on GPU specific code for now. Currently, NPU hasn't fully supported it. Change here is safe.

…in async_omni.py Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

gcanlin · 2026-03-20T09:21:59Z

@Gaohan123 @tzhouam I'm upgrading NPU part as well. Just a little change. Could I push it in this PR?

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

princepride · 2026-03-20T14:28:32Z

@tzhouam #2047 I revert some patch, PTAL.

Isotr0py · 2026-03-21T06:06:38Z

+
+RUN uv pip install --system --upgrade \
+        "flashinfer-cubin==0.6.6" \
+        "nvidia-cublas-cu12==12.9.1.4" \


Do we still need the cublas version upgrade?

Gaohan123 · 2026-03-21T12:37:00Z

This PR has passed all tests except only one test interrupted by system due to long time of pulling image. Ready to merge
https://buildkite.com/vllm/vllm-omni/builds/4651/steps/canvas

### vllm-omni-audio-tts - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-perf - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-api - Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection - Changes: - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection ### vllm-omni-contrib - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-cicd - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-api - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-perf - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-contrib - Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0 ### vllm-omni-serving - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-contrib - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-api - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0` ### vllm-omni-cicd - Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test. - Changes: - Bug fix: [CI] Fix test. ### vllm-omni-cicd - Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml` ### vllm-omni-cicd - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-perf - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-serving - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-image-gen - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-perf - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-serving - Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni - Changes: - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni ### vllm-omni-image-gen - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images - Additions: - Qwen-Image-Layered - Qwen-Image-Layered - Qwen-Image-Layered ### vllm-omni-api - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images ### vllm-omni-cicd - Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3) ### vllm-omni-serving - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-audio-tts - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-perf - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-serving - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-api - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-serving - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-cicd - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-api - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Additions: - `/v1/chat/completions` ### vllm-omni-perf - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) ### vllm-omni-contrib - Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case) - Changes: - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case) ### vllm-omni-contrib - Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support ### vllm-omni-cicd - Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash - Changes: - Bug fix: Fix Base voice clone streaming quality and stop-token crash ### vllm-omni-cicd - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models - Changes: - Bug fix: Fix OmniGen2 transformer config loading for HF models ### vllm-omni-audio-tts - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-perf - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-audio-tts - Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated ### vllm-omni-contrib - Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models - Changes: - New feature: [Docs] Add Wan2.1-T2V as supported video generation models ### vllm-omni-video-gen - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-perf - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-audio-tts - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-perf - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-api - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-perf - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-contrib - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-serving - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-cicd - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-image-gen - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-contrib - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-distributed - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-quantization - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-cicd - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-perf - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-contrib - Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0 ### vllm-omni-contrib - Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section - Changes: - New feature: Add `Governance` section ### vllm-omni-distributed - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism ### vllm-omni-cicd - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism

tzhouam added 30 commits March 10, 2026 01:46

Align dev/vllm-align with upstream vLLM changes

dd0c893

- Update Dockerfile.ci to install vLLM from specific commit wheel, add flashinfer/cublas/numpy dependencies - Fix worker_type leak in omni_stage by using pop instead of get Made-with: Cursor Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Update Dockerfile.ci to install vLLM from a new commit and simplify w…

3630bdf

…orker_type retrieval in omni_stage.py. Removed unnecessary dependency installations for flashinfer and numpy. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

e150a1b

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

7706132

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

cfbdb57

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM 156e33553ccd)

8ac369c

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Update Dockerfile.ci to install additional dependencies for flashinfe…

cf0a7a5

…r and numpy, and create a symlink for python3. This enhances the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

9dc5d15

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

5c845ce

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

4014e76

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

dd0ec55

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

b0cf788

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

cee2b4b

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Refactor Dockerfile.ci to consolidate package installation commands i…

1c0a71e

…nto a single RUN statement, improving readability and efficiency of the CI environment setup. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Revert "Refactor Dockerfile.ci to consolidate package installation co…

a9862f1

…mmands into a single RUN statement, improving readability and efficiency of the CI environment setup." This reverts commit 1c0a71e. Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

debug: empty commit to test CI pipeline (vLLM ddbb0d230a35)

20c102d

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Merge remote-tracking branch 'origin/main' into dev/vllm-align

ad12921

Merge remote-tracking branch 'origin/main' into dev/vllm-align

91135f1

Merge remote-tracking branch 'origin/main' into dev/vllm-align

e47c5dd

rebase: align vllm-omni with vLLM 84e436ed1c94

08a2673

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

rebase: align vllm-omni with vLLM 84e436ed1c94

11b6c16

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Adjust GPU memory utilization and device settings in qwen2_5_omni_ci.…

72cb4f6

…yaml for improved performance Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Fix output queue handling in OmniStage to return None if queue is clo…

d911475

…sed during shutdown Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Merge branch 'main' into dev/vllm-align

d3db476

Merge remote-tracking branch 'origin/main' into dev/vllm-align

97899c6

Merge remote-tracking branch 'origin/main' into dev/vllm-align

f137a1e

rebase: align vllm-omni with vLLM e9163b536e72

3a508d7

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

Refactor import statement for OpenAIServingEmbedding in api_server.py

aae101f

Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

tzhouam requested review from DarkLight1337, Gaohan123, Isotr0py, ZJY0516, david6666666, gcanlin, hsliuustc0106, linyueqian, princepride, wtomin and ywang96 March 20, 2026 06:23

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

Merge branch 'main' into dev/rebase-v0.18.0

66c31a3

Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>

Gaohan123 reviewed Mar 20, 2026

View reviewed changes

Gaohan123 added the ready label to trigger buildkite CI label Mar 20, 2026

[Refactor] Remove unused parameters from AsyncOmni class constructor …

521f178

…in async_omni.py Signed-off-by: tzhouam <tzhouam@connect.ust.hk>

gcanlin and others added 3 commits March 20, 2026 09:53

[NPU] Upgrade to v0.18.0

9538e3f

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Merge branch 'main' into dev/rebase-v0.18.0

869a593

Merge branch 'main' into dev/rebase-v0.18.0

826b3b3

Isotr0py reviewed Mar 21, 2026

View reviewed changes

tzhouam added 2 commits March 21, 2026 15:56

Merge branch 'main' into dev/rebase-v0.18.0

e805487

Merge branch 'main' into dev/rebase-v0.18.0

c27f1c4

Gaohan123 merged commit a90a769 into main Mar 21, 2026
6 of 8 checks passed

y123456y78 mentioned this pull request Mar 22, 2026

[Voxtral] Fix Voxtral TTS end2end.py #2067

Merged

	io_processor_plugin = vllm_config.model_config.io_processor_plugin
	self.io_processor = get_io_processor(vllm_config, io_processor_plugin)

Conversation

tzhouam commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Mar 20, 2026

Uh oh!

tzhouam commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Mar 20, 2026

Uh oh!

princepride commented Mar 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gaohan123 commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tzhouam commented Mar 20, 2026 •

edited

Loading