vllm-project · tzhouam · Nov 17, 2025 · Nov 14, 2025 · Nov 15, 2025 · Nov 15, 2025
@@ -26,4 +26,4 @@ repos:
     rev: 7.1.1
     hooks:
       - id: flake8
-        args: ["--max-line-length=88", "--extend-ignore=E203,W503"]
+        args: ["--max-line-length=160", "--extend-ignore=E203,W503"]
@@ -100,15 +100,7 @@ uv pip install -e .
 
 ## Run examples (Qwen2.5-omni)
 
-Get into the example folder
-```bash
-cd examples/offline_inference/qwen_2_5_omni
-```
-Modify PYTHONPATH in run.sh as your path of vllm_omni. Then run.
-```bash
-bash run.sh
-```
-The output audio is saved in ./output_audio
+Please check the folder of [examples](examples)
 
 ## Further details
 

@@ -40,15 +40,15 @@ Example docstring:
 ```python
 class OmniLLM:
     """Main entry point for vLLM-omni inference.
-    
+
     This class provides a high-level interface for running multi-modal
     inference with non-autoregressive models.
-    
+
     Args:
         model: Model name or path
         stage_configs: Optional stage configurations
         **kwargs: Additional arguments passed to the engine
-        
+
     Example:
         >>> llm = OmniLLM(model="Qwen/Qwen2.5-Omni")
         >>> outputs = llm.generate(prompts="Hello")

@@ -66,4 +66,3 @@ Worker classes and model runners for distributed inference.
 - [vllm_omni.worker.gpu_model_runner.OmniGPUModelRunner][]
 - [vllm_omni.worker.gpu_ar_model_runner.GPUARModelRunner][]
 - [vllm_omni.worker.gpu_diffusion_model_runner.GPUDiffusionModelRunner][]
-
@@ -2,7 +2,7 @@
 
 This document contains comprehensive API design documentation for all core modules in vLLM-omni. These templates provide a standardized structure for designing and implementing the core, engine, executor, and worker modules.
 
-## 📋 Module API 
+## 📋 Module API
 
 ### Core Module API
 **Core module** provides fundamental scheduling, caching, and resource management functionality.

@@ -1,4 +1,4 @@
-# Architecture Overview 
+# Architecture Overview
 
 # Introduction
 

@@ -10,4 +10,3 @@ This section contains design documents and architecture specifications for vLLM-
 ## API Design Documentation
 
 - [vLLM-omni API Documentation](api_design_doc.md)
-
@@ -28,6 +28,5 @@
 
 ## <span class="twemoji">📚</span> Documentation Navigation
 
-- To run open-source models on vLLM-Omni, we recommend starting with the [:material-code-tags: User Quide](user_guide/getting_started/quickstart.md) 
+- To run open-source models on vLLM-Omni, we recommend starting with the [:material-code-tags: User Quide](user_guide/getting_started/quickstart.md)
 - To develop and contribute to vLLM-Omni, we recommend starting with the [:material-tools: Developer Guide](contributing/README.md)
-
@@ -103,11 +103,20 @@ def determine_other_files(self) -> list[Path]:
         if self.path.is_file():
             return []
         # Text file extensions that can be safely included
-        text_extensions = {".py", ".md", ".sh", ".yaml", ".yml", ".json", ".txt", ".toml", ".cfg", ".ini"}
-        is_other_file = lambda file: (
-            file.is_file() 
-            and file != self.main_file 
-            and file.suffix in text_extensions
+        text_extensions = {
+            ".py",
+            ".md",
+            ".sh",
+            ".yaml",
+            ".yml",
+            ".json",
+            ".txt",
+            ".toml",
+            ".cfg",
+            ".ini",
+        }
+        is_other_file = lambda file: (  # noqa: E731
+            file.is_file() and file != self.main_file and file.suffix in text_extensions
         )
         return [file for file in self.path.rglob("*") if is_other_file(file)]
 
@@ -172,7 +181,7 @@ def generate(self) -> str:
                 f"{code_fence}\n"
             )
         else:
-            with open(self.main_file) as f:
+            with open(self.main_file, encoding="utf-8") as f:
                 # Skip the title from md snippets as it's been included above
                 main_content = f.readlines()[1:]
             content += self.fix_relative_links("".join(main_content))

@@ -53,4 +53,3 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
 .md-nav__item--section:has([href*="api/vllm_omni/index"]) > .md-nav > .md-nav__list > .md-nav__item--nested > .md-nav > .md-nav__list > .md-nav__item--nested.md-nav__item--active > .md-nav {
     display: block;
 }
-
@@ -6,4 +6,3 @@ vLLM-omni's examples are split into two categories:
 - If you are using vLLM-omni from an HTTP application or client, see the *Online Serving* section.
 
 For detailed example documentation, check the [examples directory](https://github.com/vllm-project/vllm-omni/tree/main/examples) in the repository.
-
@@ -0,0 +1,64 @@
+# Offline Example of vLLM-omni for Qwen2.5-omni
+
+Source <https://github.com/vllm-project/vllm-omni/tree/main/examples\offline_inference\qwen2_5_omni>.
+
+
+## 🛠️ Installation
+
+Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/README.md)
+
+## Run examples (Qwen2.5-omni)
+### Multiple Prompts
+Download dataset from [seed_tts](https://drive.google.com/file/d/1GlSjVfSHkW3-leKKBlfrjuuTGqQ_xaLP/edit). To get the prompt, you can:
+```bash
+tar -xf <Your Download Path>/seedtts_testset.tar
+cp seedtts_testset/en/meta.lst examples/offline_inference/qwen2_5_omni/meta.lst
+python3 examples/offline_inference/qwen2_5_omni/extract_prompts.py \
+  --input examples/offline_inference/qwen2_5_omni/meta.lst \
+  --output examples/offline_inference/qwen2_5_omni/top100.txt \
+  --topk 100
+```
+Get into the example folder
+```bash
+cd examples/offline_inference/qwen2_5_omni
+```
+Then run the command below.
+```bash
+bash run_multiple_prompts.sh
+```
+### Single Prompts
+Get into the example folder
+```bash
+cd examples/offline_inference/qwen2_5_omni
+```
+Then run the command below.
+```bash
+bash run_single_prompt.sh
+```
+
+## Example materials
+
+??? abstract "end2end.py"
+    ``````py
+    --8<-- "examples\offline_inference\qwen2_5_omni\end2end.py"
+    ``````
+??? abstract "extract_prompts.py"
+    ``````py
+    --8<-- "examples\offline_inference\qwen2_5_omni\extract_prompts.py"
+    ``````
+??? abstract "processing_omni.py"
+    ``````py
+    --8<-- "examples\offline_inference\qwen2_5_omni\processing_omni.py"
+    ``````
+??? abstract "run_multiple_prompts.sh"
+    ``````sh
+    --8<-- "examples\offline_inference\qwen2_5_omni\run_multiple_prompts.sh"
+    ``````
+??? abstract "run_single_prompt.sh"
+    ``````sh
+    --8<-- "examples\offline_inference\qwen2_5_omni\run_single_prompt.sh"
+    ``````
+??? abstract "utils.py"
+    ``````py
+    --8<-- "examples\offline_inference\qwen2_5_omni\utils.py"
+    ``````
@@ -0,0 +1,35 @@
+# Online serving Example of vLLM-omni for Qwen2.5-omni
+
+Source <https://github.com/vllm-project/vllm-omni/blob/main/examples\online_serving\README.md>.
+
+
+## 🛠️ Installation
+
+Please refer to [README.md](https://github.com/vllm-project/vllm-omni/blob/main/README.md)
+
+## Run examples (Qwen2.5-omni)
+
+Launch the server
+```bash
+vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091
+```
+
+If you have custom stage configs file, launch the server with command below
+```bash
+vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
+```
+
+Get into the example folder
+```bash
+cd examples/online_serving
+```
+
+Send request via python
+```bash
+python openai_chat_completion_client_for_multimodal_generation.py
+```
+
+Send request via curl
+```bash
+bash run_curl_multimodal_generation.sh
+```
@@ -0,0 +1,7 @@
+# OpenAI Chat Completion Client For Multimodal Generation
+
+Source <https://github.com/vllm-project/vllm-omni/blob/main/examples\online_serving\openai_chat_completion_client_for_multimodal_generation.py>.
+
+``````py
+--8<-- "examples\online_serving\openai_chat_completion_client_for_multimodal_generation.py"
+``````
@@ -53,4 +53,3 @@ The output audio is saved in ./output_audio
 - Read the [architecture documentation](../../contributing/design_documents/vllm_omni_design.md)
 - Check out the [API reference](../../api/overview.md)
 - Explore the [examples](../examples/index.md)
-
@@ -0,0 +1,34 @@
+# Offline Example of vLLM-omni for Qwen2.5-omni
+
+## 🛠️ Installation
+
+Please refer to [README.md](../../../README.md)
+
+## Run examples (Qwen2.5-omni)
+### Multiple Prompts
+Download dataset from [seed_tts](https://drive.google.com/file/d/1GlSjVfSHkW3-leKKBlfrjuuTGqQ_xaLP/edit). To get the prompt, you can:
+```bash
+tar -xf <Your Download Path>/seedtts_testset.tar
+cp seedtts_testset/en/meta.lst examples/offline_inference/qwen2_5_omni/meta.lst
+python3 examples/offline_inference/qwen2_5_omni/extract_prompts.py \
+  --input examples/offline_inference/qwen2_5_omni/meta.lst \
+  --output examples/offline_inference/qwen2_5_omni/top100.txt \
+  --topk 100
+```
+Get into the example folder
+```bash
+cd examples/offline_inference/qwen2_5_omni
+```
+Then run the command below.
+```bash
+bash run_multiple_prompts.sh
+```
+### Single Prompts
+Get into the example folder
+```bash
+cd examples/offline_inference/qwen2_5_omni
+```
+Then run the command below.
+```bash
+bash run_single_prompt.sh
+```
@@ -7,8 +7,8 @@
 import soundfile as sf
 import torch
 from utils import make_omni_prompt
-from vllm.sampling_params import SamplingParams
 
+from vllm.sampling_params import SamplingParams
 from vllm_omni.entrypoints.omni_llm import OmniLLM
 
 _os_env_toggle.environ["VLLM_USE_V1"] = "1"
@@ -109,10 +109,14 @@ def parse_args():
     parser.add_argument("--use-torchvision", action="store_true")
     parser.add_argument("--tokenize", action="store_true")
     parser.add_argument(
-        "--output-wav", default="output.wav", help="[Deprecated] Output wav directory (use --output-dir)."
+        "--output-wav",
+        default="output.wav",
+        help="[Deprecated] Output wav directory (use --output-dir).",
     )
     parser.add_argument(
-        "--output-dir", default="outputs", help="Output directory to save text and wav files together."
+        "--output-dir",
+        default="outputs",
+        help="Output directory to save text and wav files together.",
     )
     parser.add_argument(
         "--thinker-hidden-states-dir",
@@ -168,7 +172,9 @@ def main():
         raise
 
     if args.prompts is None:
-        raise ValueError("No prompts provided. Use --prompts ... or --txt-prompts <file.txt> (with --prompt_type text)")
+        raise ValueError(
+            "No prompts provided. Use --prompts ... or --txt-prompts <file.txt> (with --prompt_type text)"
+        )
     omni_llm = OmniLLM(
         model=model_name,
         log_stats=args.enable_stats,
@@ -217,7 +223,9 @@ def main():
     omni_outputs = omni_llm.generate(prompt, sampling_params_list)
 
     # Determine output directory: prefer --output-dir; fallback to --output-wav
-    output_dir = args.output_dir if getattr(args, "output_dir", None) else args.output_wav
+    output_dir = (
+        args.output_dir if getattr(args, "output_dir", None) else args.output_wav
+    )
     os.makedirs(output_dir, exist_ok=True)
     for stage_outputs in omni_outputs:
         if stage_outputs.final_output_type == "text":
Original file line number	Diff line number	Diff line change
Expand Up		@@ -66,4 +66,3 @@ Worker classes and model runners for distributed inference.
		- [vllm_omni.worker.gpu_model_runner.OmniGPUModelRunner][]
		- [vllm_omni.worker.gpu_ar_model_runner.GPUARModelRunner][]
		- [vllm_omni.worker.gpu_diffusion_model_runner.GPUDiffusionModelRunner][]
Original file line number	Diff line number	Diff line change
Expand Up		@@ -10,4 +10,3 @@ This section contains design documents and architecture specifications for vLLM-
		## API Design Documentation

		- [vLLM-omni API Documentation](api_design_doc.md)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -53,4 +53,3 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
		.md-nav__item--section:has([href*="api/vllm_omni/index"]) > .md-nav > .md-nav__list > .md-nav__item--nested > .md-nav > .md-nav__list > .md-nav__item--nested.md-nav__item--active > .md-nav {
		display: block;
		}
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,4 +6,3 @@ vLLM-omni's examples are split into two categories:
		- If you are using vLLM-omni from an HTTP application or client, see the Online Serving section.

		For detailed example documentation, check the [examples directory](https://github.com/vllm-project/vllm-omni/tree/main/examples) in the repository.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -53,4 +53,3 @@ The output audio is saved in ./output_audio
		- Read the [architecture documentation](../../contributing/design_documents/vllm_omni_design.md)
		- Check out the [API reference](../../api/overview.md)
		- Explore the [examples](../examples/index.md)