Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c7a3ce4
[Refactor] make torch profiler aligned with upstream cli
gcanlin Feb 7, 2026
2149f41
clean
gcanlin Feb 7, 2026
d372027
update
gcanlin Feb 7, 2026
0a66ff6
clean
gcanlin Feb 7, 2026
7e4ea95
lint
gcanlin Feb 7, 2026
c450de6
Merge branch 'main' into profiler-cli
gcanlin Feb 26, 2026
b200138
inherit vllm ProfilerConfig
gcanlin Feb 26, 2026
98ad751
lint
gcanlin Feb 26, 2026
a6968ab
fix
gcanlin Feb 26, 2026
dd6813b
update examples
gcanlin Feb 26, 2026
8944784
add profiler config in qwen-omni
gcanlin Feb 26, 2026
733de3b
add qwen omni examples
gcanlin Feb 26, 2026
58e626a
Merge branch 'main' into profiler-cli
gcanlin Feb 26, 2026
baf4e6e
fix lint
gcanlin Feb 26, 2026
644fb72
update example
gcanlin Feb 27, 2026
6a8570a
update docs
gcanlin Feb 27, 2026
f6914b4
Merge branch 'main' into profiler-cli
gcanlin Feb 27, 2026
a2b1aaa
fix lint
gcanlin Feb 27, 2026
b320cf4
clean env
gcanlin Feb 27, 2026
4806fba
update api server & docs
gcanlin Feb 27, 2026
69c329c
unify profiler
gcanlin Feb 28, 2026
96d4173
Merge branch 'main' into profiler-cli
gcanlin Mar 4, 2026
a33c6b9
fix lint
gcanlin Mar 5, 2026
f1f128b
update
gcanlin Mar 5, 2026
be83458
example
gcanlin Mar 5, 2026
676d2ec
Merge branch 'main' into profiler-cli
gcanlin Mar 5, 2026
3d2c71a
update docs and examples
gcanlin Mar 5, 2026
c95d2b2
NPU temp
gcanlin Mar 5, 2026
c0796ef
refacotr npu
gcanlin Mar 6, 2026
7668801
fix
gcanlin Mar 6, 2026
f311c5a
Merge branch 'main' into profiler-cli
gcanlin Mar 15, 2026
e22baa0
fix lint
gcanlin Mar 15, 2026
17da063
fix lint
gcanlin Mar 15, 2026
14593d8
fix inline engine bug
gcanlin Mar 15, 2026
6f5bd1a
update examples
gcanlin Mar 15, 2026
40ba4bd
fix docs
gcanlin Mar 15, 2026
3f00058
Merge branch 'main' into pr-1261
gcanlin Mar 17, 2026
26c0504
update
gcanlin Mar 17, 2026
04af2b4
lint
gcanlin Mar 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 100 additions & 28 deletions docs/contributing/profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,46 +4,56 @@

vLLM-Omni uses the PyTorch Profiler to analyze performance across both **multi-stage omni-modality models** and **diffusion models**.

### 1. Set the Output Directory
Before running any script, set this environment variable. The system detects this and automatically saves traces here.

```bash
export VLLM_TORCH_PROFILER_DIR=./profiles
### 1. Configure Profiling in the Stage YAML

Enable profiling by adding `profiler_config` under `engine_args` for the stage(s) you want to profile in your stage config YAML:

```yaml
stage_args:
- stage_id: 0
stage_type: llm
engine_args:
# ... other engine args ...
profiler_config:
profiler: torch
torch_profiler_dir: ./perf
```

### 2. Profiling Omni-Modality Models
| Field | Description |
|---|---|
| `profiler` | Profiler backend to use. Currently supports `torch`. |
| `torch_profiler_dir` | Directory where trace files are saved. Created automatically if it doesn't exist. |

It is best to limit profiling to one iteration to keep trace files manageable.
> **Tip:** Only enable `profiler_config` on stages you actually need to profile. Stages without it will not start a profiler, keeping overhead minimal.

```bash
export VLLM_PROFILER_MAX_ITERS=1
```
### 2. Profiling Omni-Modality Models

**Selective Stage Profiling**
The profiler is default to function across all stages. But It is highly recommended to profile specific stages by passing the stages list, preventing from producing too large trace files:

It is highly recommended to profile specific stages to prevent producing overly large trace files:

```python
# Profile all stages
omni_llm.start_profile()

# Only profile Stage 1
omni_llm.start_profile(stages=[1])
```

```python
# Stage 0 (Thinker) and Stage 2 (Audio Decoder) for qwen omni
omni_llm.start_profile(stages=[0, 2])
```

> **Important:** Always pass the same `stages` list to both `start_profile()` and `stop_profile()`. If you omit `stages` from `stop_profile()`, it defaults to stopping all stages — including ones that were never started — which will produce errors.

**Python Usage**: Wrap your generation logic with `start_profile()` and `stop_profile()`.

```python
from vllm_omni import omni_llm

profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
profiler_stages = [0] # Only profile the stages you need

# 1. Start profiling if enabled
if profiler_enabled:
omni_llm.start_profile(stages=[0])
# 1. Start profiling
omni_llm.start_profile(stages=profiler_stages)

# Initialize generator
omni_generator = omni_llm.generate(prompts, sampling_params_list, py_generator=args.py_generator)
Expand All @@ -64,7 +74,8 @@ for stage_outputs in omni_generator:
print(f"[Info] Processed {processed_count}/{total_requests}. Stopping profiler inside active loop...")

# Stop the profiler while workers are still active
omni_llm.stop_profile()
# Pass the same stages list used in start_profile()
omni_llm.stop_profile(stages=profiler_stages)

# Wait for traces to flush to disk
print("[Info] Waiting 30s for workers to write trace files to disk...")
Expand All @@ -75,24 +86,38 @@ omni_llm.close()
```


**CLI Usage** (using `end2end.py`):
```bash
# Profile only Stage 0 (Thinker)
python end2end.py --output-wav output_audio \
--query-type text --profiler-dir ./profile --profiler-stages 0

# Profile Stage 0 and Stage 2
python end2end.py --output-wav output_audio \
--query-type text --profiler-dir ./profile --profiler-stages 0 2

# Profile all stages (omit --profiler-stages)
python end2end.py --output-wav output_audio \
--query-type text --profiler-dir ./profile
```

**Examples**:

1. **Qwen2.5-Omni**: [https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/qwen2_5_omni/end2end.py](https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/qwen2_5_omni/end2end.py)

2. **Qwen3-Omni**: [https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/qwen3_omni/end2end.py](https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/qwen3_omni/end2end.py)


### 3. Profiling diffusion models

Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding.
Diffusion profiling is End-to-End, capturing encoding, denoising loops, and decoding. Standalone diffusion scripts use `--profiler-dir` to enable profiling.

**CLI Usage:**
```python

```bash
python image_to_video.py \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--image qwen-bear.png \
--prompt "A cat playing with yarn, smooth motion" \
--profiler-dir \
\
# Minimize Spatial Dimensions (Optional but helpful):
# Drastically reduces memory usage so the profiler doesn't
Expand Down Expand Up @@ -122,25 +147,72 @@ python image_to_video.py \
--flow-shift 12.0 \
--fps 16 \
--output i2v_output.mp4

```

> **Note:** For diffusion stages within a multi-stage omni pipeline, use `profiler_config` in the stage YAML instead (see Section 1).

**Examples**:

1. **Qwen image edit**: [https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/image_to_image/image_edit.py](https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/image_to_image/image_edit.py)

2. **Wan-AI/Wan2.2-I2V-A14B-Diffusers**: [https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_video](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_video)

### 4. Analyzing Omni Traces
### 4. Profiling Online Serving

When `profiler_config` is set in the stage YAML, the server automatically exposes `/start_profile` and `/stop_profile` HTTP endpoints.

**1. Start the server** with a stage YAML that has `profiler_config` enabled:
```bash
vllm serve Qwen/Qwen2.5-Omni-7B \
--omni \
--stage-configs-path qwen2_5_omni.yaml \
--port 8091
```

Or for one stage diffusion models:

```bash
vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni --port 8091 --profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
```

**2. Start profiling** by sending a POST request:
```bash
# Profile all stages that have profiler_config set
curl -X POST http://localhost:8091/start_profile

# Profile specific stages only
curl -X POST http://localhost:8091/start_profile \
-H "Content-Type: application/json" \
-d '{"stages": [0]}'
```

**3. Send your inference requests** as normal while the profiler is running.

**4. Stop profiling** and collect traces:
```bash
# Stop all stages
curl -X POST http://localhost:8091/stop_profile

# Stop specific stages (must match the stages you started)
curl -X POST http://localhost:8091/stop_profile \
-H "Content-Type: application/json" \
-d '{"stages": [0]}'
```

Trace files are written to the `torch_profiler_dir` specified in your stage YAML.

> **Important:** Always stop the same stages you started. Stopping a stage that was never started will produce errors.

### 5. Analyzing Traces

Output files are saved to your configured ```VLLM_TORCH_PROFILER_DIR```.
Output files are saved to the `torch_profiler_dir` specified in your stage YAML config.

**Output**
**Chrome Trace** (```.json.gz```): Visual timeline of kernels and stages. Open in Perfetto UI.
**Chrome Trace** (`.json.gz`): Visual timeline of kernels and stages. Open in Perfetto UI.

**Viewing Tools:**

- [Perfetto](https://ui.perfetto.dev/)(recommended)
- ```chrome://tracing```(Chrome only)
- [Perfetto](https://ui.perfetto.dev/) (recommended)
- `chrome://tracing` (Chrome only)

**Note**: vLLM-Omni reuses the PyTorch Profiler infrastructure from vLLM. See the official vLLM profiler documentation: [vLLM Profiling Guide](https://docs.vllm.ai/en/stable/contributing/profiling/)
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,14 @@ def parse_args() -> argparse.Namespace:
parser.add_argument("--resolution", type=int, default=640)
parser.add_argument("--color-format", type=str, default="RGB")

# Profiler Options
parser.add_argument(
"--profiler_dir",
Comment thread
gcanlin marked this conversation as resolved.
type=str,
default=None,
help="Directory to save torch profiler traces. Enables profiling when set.",
)

# Acceleration + Optimization Options
parser.add_argument("--cache-dit-fn-compute-blocks", type=int, default=1)
parser.add_argument("--cache-dit-bn-compute-blocks", type=int, default=0)
Expand Down Expand Up @@ -146,6 +154,14 @@ async def main():
else:
cache_config = None

# ---- Profiler Config ----
profiler_config = None
if args.profiler_dir:
profiler_config = {
"profiler": "torch",
"torch_profiler_dir": args.profiler_dir,
}

# ---- Initialize Omni ----
omni = Omni(
model=args.model,
Expand All @@ -158,12 +174,13 @@ async def main():
enable_cpu_offload=args.enable_cpu_offload,
diffusion_load_format="dummy",
custom_pipeline_args={"pipeline_class": "custom_pipeline.CustomPipeline"},
profiler_config=profiler_config,
)

print(">>> Pipeline loaded successfully")

# ---- Profiling + Info ----
profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
profiler_enabled = args.profiler_dir is not None
print(f"\n{'=' * 60}")
print("Generation Configuration")
print(f"Model: {args.model}")
Expand Down
18 changes: 16 additions & 2 deletions examples/offline_inference/image_to_image/image_edit.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,12 @@ def parse_args() -> argparse.Namespace:
action="store_true",
help="Enable layerwise (blockwise) offloading on DiT modules.",
)
parser.add_argument(
"--profiler-dir",
type=str,
default=None,
help="Enables profiling when set.",
)
return parser.parse_args()


Expand Down Expand Up @@ -378,6 +384,14 @@ def main():
# Note: coefficients will use model-specific defaults based on model_type
}

# Build profiler config from CLI arg
profiler_config = None
if args.profiler_dir:
profiler_config = {
"profiler": "torch",
"torch_profiler_dir": args.profiler_dir,
}

# Initialize Omni with appropriate pipeline
omni = Omni(
model=args.model,
Expand All @@ -389,11 +403,11 @@ def main():
parallel_config=parallel_config,
enforce_eager=args.enforce_eager,
enable_cpu_offload=args.enable_cpu_offload,
profiler_config=profiler_config,
)
print("Pipeline loaded")

# Check if profiling is requested via environment variable
profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
profiler_enabled = args.profiler_dir is not None

# Time profiling for generation
print(f"\n{'=' * 60}")
Expand Down
21 changes: 16 additions & 5 deletions examples/offline_inference/image_to_video/image_to_video.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
"""

import argparse
import os
import time
from pathlib import Path

Expand Down Expand Up @@ -136,6 +135,12 @@ def parse_args() -> argparse.Namespace:
action="store_true",
help="Disable torch.compile and force eager execution.",
)
parser.add_argument(
"--profiler-dir",
type=str,
default=None,
help="Enables profiling when set.",
)
parser.add_argument(
"--audio-sample-rate",
type=int,
Expand Down Expand Up @@ -225,6 +230,15 @@ def main():
# Resize image to target dimensions
image = image.resize((width, height), PIL.Image.Resampling.LANCZOS)

# Build profiler config from CLI arg
profiler_config = None
if args.profiler_dir:
profiler_config = {
"profiler": "torch",
"torch_profiler_dir": args.profiler_dir,
}

profiler_enabled = args.profiler_dir is not None
# Configure cache based on backend type
cache_config = None
if args.cache_backend == "cache_dit":
Expand Down Expand Up @@ -256,8 +270,6 @@ def main():
"rel_l1_thresh": 0.2,
}

# Check if profiling is requested via environment variable
profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
parallel_config = DiffusionParallelConfig(
ulysses_degree=args.ulysses_degree,
ring_degree=args.ring_degree,
Expand All @@ -278,11 +290,10 @@ def main():
enable_cpu_offload=args.enable_cpu_offload,
parallel_config=parallel_config,
enforce_eager=args.enforce_eager,
profiler_config=profiler_config,
model_class_name=model_class_name,
cache_backend=args.cache_backend,
cache_config=cache_config,
)

if profiler_enabled:
print("[Profiler] Starting profiling...")
omni.start_profile()
Expand Down
19 changes: 16 additions & 3 deletions examples/offline_inference/qwen2_5_omni/end2end.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,9 +377,9 @@ def main(args):
for i, prompt in enumerate(prompts):
prompt["modalities"] = output_modalities

profiler_enabled = bool(os.getenv("VLLM_TORCH_PROFILER_DIR"))
profiler_enabled = args.enable_profiler is not None
Comment thread
gcanlin marked this conversation as resolved.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.enable_profiler does not exist — the argparse argument is --profiler-dir, so the attribute is args.profiler_dir. This will crash with AttributeError.

Also --profiler-dir below uses action="store_true" (boolean), but the other examples (text_to_image.py, qwen3_omni/end2end.py) use type=str so it’s an actual directory path. Should be consistent.

Suggested change
profiler_enabled = args.enable_profiler is not None
profiler_enabled = args.profiler_dir is not None

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For omni model, will unify to enable_profiler. Because profiler_dir will need to be defined in yaml config.

if profiler_enabled:
omni_llm.start_profile(stages=[0])
omni_llm.start_profile(stages=args.profiler_stages)
omni_generator = omni_llm.generate(prompts, sampling_params_list, py_generator=args.py_generator)

# Determine output directory: prefer --output-dir; fallback to --output-wav
Expand Down Expand Up @@ -419,7 +419,7 @@ def main(args):
if profiler_enabled and processed_count >= total_requests:
print(f"[Info] Processed {processed_count}/{total_requests}. Stopping profiler inside active loop...")
# Stop the profiler while workers are still alive
omni_llm.stop_profile()
omni_llm.stop_profile(stages=args.profiler_stages)

print("[Info] Waiting 30s for workers to write massive trace files to disk...")
time.sleep(30)
Expand Down Expand Up @@ -539,6 +539,19 @@ def parse_args():
default=False,
help="Use py_generator mode. The returned type of Omni.generate() is a Python Generator object.",
)
parser.add_argument(
"--profiler-dir",
action="store_true",
default=False,
help="Enable torch profiler traces. Enables profiling when set.",
)
parser.add_argument(
"--profiler-stages",
type=int,
nargs="+",
default=None,
help="Stage IDs to profile (e.g. --profiler-stages 0 1 2). If not set, profiles all stages.",
)
return parser.parse_args()


Expand Down
Loading
Loading