Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
d613864
[Model] Add HSDP support for LTX-2 (#2899)
fywc Apr 20, 2026
c2859e9
[Revert] drop Wan2.2 prompt-length enforcement from #2847 (#2877)
david6666666 Apr 20, 2026
7e28eda
[Bugfix] Fix GLM-Image output dimensions and image edit pipeline (#2…
JaredforReal Apr 20, 2026
23b2a95
[Docs] Add Wan2.2 image-to-video recipe for Ascend NPU (A2/A3) (#2919)
gcanlin Apr 20, 2026
6128f6d
[Example] Add Hunyuan-Image3 end2end.py and README.md (#2590)
kechengliu97 Apr 20, 2026
851c513
fix ci
yinpeiqi Apr 20, 2026
461bddc
CI: publish Omni images to a separate Docker Hub repository (#2829)
sheralskumar Apr 20, 2026
524fe49
Merge remote-tracking branch 'upstream/main' into support-stage-scale…
yinpeiqi Apr 20, 2026
21d1c8e
Merge branch 'main' into support-stage-scale-out
yinpeiqi Apr 20, 2026
71d81d4
[Enhancement] add pytorch profiler ops and memory record (#2472)
bjf-frz Apr 20, 2026
99fa92e
Merge branch 'main' into support-stage-scale-out
yinpeiqi Apr 20, 2026
dc8a9e2
[Model] feat: FastGen DMD2-distilled Wan 2.1 pipelines (T2V, I2V) …
ayushag-nv Apr 20, 2026
27ae9d7
fix for ci
yinpeiqi Apr 20, 2026
a4b2015
add nightly replica test
yinpeiqi Apr 20, 2026
400690f
[CI] Remove small resolution test in Qwen-Image Perf test when vae pa…
wtomin Apr 20, 2026
618268d
[Bugfix] Truncate mimo-audio code2wav prompt to MAX_CODE2WAV_TOKENS (…
lishunyang12 Apr 20, 2026
e076378
[Feat][sleepmode] add omni sleepmode and ack protocol (#2022)
Flink-ddd Apr 21, 2026
46093bd
update gpu utilization
yinpeiqi Apr 21, 2026
de4e472
[CI][Bugfix] Improve cosine similarity calculation by incorporating l…
yenuo26 Apr 21, 2026
c1ba86a
[BugFix] Fix the issue with stream=True (#2955)
amy-why-3459 Apr 21, 2026
cc1005e
update init
yinpeiqi Apr 21, 2026
fc29ebd
update init
yinpeiqi Apr 21, 2026
7f75ae1
update test
yinpeiqi Apr 21, 2026
52b5336
[Enhancement] Engine runtime errors (#2426)
pi314ever Apr 21, 2026
ad7c966
[BugFix] add missing subtalker sampling config to Qwen3-TTS deploy YA…
xiaohajiayou Apr 21, 2026
204da62
Merge remote-tracking branch 'upstream/main' into support-stage-scale…
yinpeiqi Apr 21, 2026
ff1b698
add import
yinpeiqi Apr 21, 2026
6e80ecc
add tests
yinpeiqi Apr 21, 2026
7fbb010
add test
yinpeiqi Apr 21, 2026
7cc8aca
fix error
yinpeiqi Apr 21, 2026
cec1216
fix by comments
yinpeiqi Apr 21, 2026
06ac245
fix error
yinpeiqi Apr 21, 2026
dceaff3
add rep id in client
yinpeiqi Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,13 @@ while true; do
done

echo "--- Pulling container"
## Temporary change to use AMD Docker Hub to store the vllm-ci image
## Temporary change to use AMD Docker Hub to store the vllm-omni image
# to bypass the rate limit issue with ECR Public Gallery.
# Images are now stored in a separate repository for vllm-omni, instead of vllm-ci.
# TODO: @tjtanaa point back to ECR Public Gallery
# once the amd agents are configured to use ECR Public Gallery.
# image_name="public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:${BUILDKITE_COMMIT}-rocm-omni"
image_name="rocm/vllm-ci:${BUILDKITE_COMMIT}-rocm-omni"
image_name="rocm/vllm-omni:${BUILDKITE_COMMIT}"
container_name="rocm_${BUILDKITE_COMMIT}_$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10; echo)"

# TODO: @tjtanaa uncomment this once the amd agents are configured to use ECR Public Gallery.
Expand Down
14 changes: 14 additions & 0 deletions .buildkite/test-amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,17 @@ steps:
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- pytest -s -v tests/e2e/online_serving/test_image_gen_edit.py


- label: "Omni Sleep Mode Test"
timeout_in_minutes: 40
agent_pool: mi325_2
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export GPU_ARCHS=gfx942
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- export VLLM_TEST_CLEAN_GPU_MEMORY="1"
- pytest -s -v tests/e2e/offline_inference/test_omni_sleep_mode.py -m "advanced_model and omni and MI325" --run-level "advanced_model"
40 changes: 40 additions & 0 deletions .buildkite/test-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,46 @@ steps:
path: /mnt/hf-cache
type: DirectoryOrCreate

- label: "Omni Sleep Mode Test with H100"
timeout_in_minutes: 30
depends_on: upload-merge-pipeline
commands:
- export VLLM_TEST_CLEAN_GPU_MEMORY="1"
- pytest -s -v tests/e2e/offline_inference/test_omni_sleep_mode.py -m "advanced_model and H100 and omni" --run-level "advanced_model"
agents:
queue: "mithril-h100-pool"
plugins:
- kubernetes:
podSpec:
containers:
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
resources:
limits:
nvidia.com/gpu: 2
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: hf-cache
mountPath: /root/.cache/huggingface
env:
- name: HF_HOME
value: /root/.cache/huggingface
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
nodeSelector:
node.kubernetes.io/instance-type: gpu-h100-sxm
volumes:
- name: devshm
emptyDir:
medium: Memory
- name: hf-cache
hostPath:
path: /mnt/hf-cache
type: DirectoryOrCreate

- label: "Voxtral-TTS E2E Test"
timeout_in_minutes: 20
depends_on: upload-merge-pipeline
Expand Down
38 changes: 38 additions & 0 deletions .buildkite/test-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,44 @@ steps:
type: DirectoryOrCreate


- label: ":full_moon: Omni Multi-Replica Startup Test with 4x H100"
timeout_in_minutes: 45
commands:
- pytest -s -v tests/e2e/online_serving/test_qwen3_omni_multi_replicas.py -m "core_model" --run-level "core_model"
agents:
queue: "mithril-h100-pool"
plugins:
- kubernetes:
podSpec:
containers:
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
resources:
limits:
nvidia.com/gpu: 4
volumeMounts:
- name: devshm
mountPath: /dev/shm
- name: hf-cache
mountPath: /root/.cache/huggingface
env:
- name: HF_HOME
value: /root/.cache/huggingface
- name: HF_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
nodeSelector:
node.kubernetes.io/instance-type: gpu-h100-sxm
volumes:
- name: devshm
emptyDir:
medium: Memory
- name: hf-cache
hostPath:
path: /mnt/hf-cache
type: DirectoryOrCreate

- group: ":card_index_dividers: TTS Model Test"
key: nightly-tts-test-group
depends_on: upload-nightly-pipeline
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/test-template-amd-omni.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Last synced: 2025-12-15
Modifications: Removed unused CUDA/NVIDIA logic, keeping only AMD tests
#}
{% set docker_image_amd = "rocm/vllm-ci:$BUILDKITE_COMMIT-rocm-omni" %}
{% set docker_image_amd = "rocm/vllm-omni:$BUILDKITE_COMMIT" %}
{% set default_working_dir = "/app/vllm-omni" %}

- group: "AMD Tests"
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ CLAUDE.md

# Codex
AGENTS.md
.codex
.codex/

# cursor
Expand Down
27 changes: 24 additions & 3 deletions docs/contributing/profiling.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,9 +138,27 @@ Single-stage diffusion serving with torch profiler:
vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--omni \
--port 8091 \
--profiler-config '{"profiler": "torch", "torch_profiler_dir": "./vllm_profile"}'
--profiler-config '{
"profiler": "torch",
"torch_profiler_dir": "/tmp/vllm_profile_wan22_i2v",
"torch_profiler_with_stack": true,
"torch_profiler_with_flops": false,
"torch_profiler_use_gzip": true,
"torch_profiler_dump_cuda_time_total": true,
"torch_profiler_record_shapes": true,
"torch_profiler_with_memory": true,
"delay_iterations": 0,
"max_iterations": 0
}'
```

Useful optional fields:

- `torch_profiler_with_stack`: export `by_stack` operator views and stack text files.
- `torch_profiler_record_shapes`: export `by_shape` operator views.
- `torch_profiler_with_memory`: dump `memory_snapshot_rank*.pickle` when the backend supports memory history.
- `torch_profiler_use_gzip`: write the trace as `trace_rank*.json.gz`.

Single-stage diffusion serving with Nsight Systems:

```bash
Expand Down Expand Up @@ -177,8 +195,11 @@ For mixed-stage pipelines, use explicit `stages` and pass the same stage list to

Torch profiler output:

- Chrome/Perfetto traces under `torch_profiler_dir`
- Optional aggregated CUDA-time tables under the same directory
- Chrome/Perfetto trace: `trace_rank*.json` or `trace_rank*.json.gz`
- Excel workbook: `ops_rank*.xlsx` with `summary`, and optional `by_shape` / `by_stack` sheets
- Stack exports: `stacks_cpu_rank*.txt` and `stacks_cuda_rank*.txt` when stack capture is enabled
- Memory snapshot: `memory_snapshot_rank*.pickle` when memory capture is enabled and supported by the backend
- Optional aggregated CUDA-time tables under the same session directory

CUDA profiler / Nsight Systems output:

Expand Down
Loading