Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
c2ae58b
[Bugfix] fix mindiesd laserattention unsupported error (#2673)
fan2956 Apr 10, 2026
fbb5dd5
[Bugfix]: modify diffusion pipeline profiler result in videos (#2647)
bjf-frz Apr 10, 2026
78bef62
[Profiler] Add Nsight Systems support for serving (#1098)
ahengljh Apr 10, 2026
687405c
[Config] Remove invalid LLM-only engine_args from diffusion stage con…
ianliuy Apr 10, 2026
2bc183f
[Refactor] Remove dependency on librosa (#2273)
NickCao Apr 10, 2026
a41174e
[Model] VoxCPM2 native AR TTS support (#2658)
linyueqian Apr 11, 2026
001f2e3
[BUG FIX]: prevent EngineCore crash when Qwen TTS Base task is missin…
teith Apr 11, 2026
d1fef41
[Doc] Add LTX-2 online serving deployment recipes with optimization b…
SamitHuang Apr 11, 2026
c9e8411
[feature] : add cache-dit for stable-audio-open-1.0 (#1341)
akshatvishu Apr 11, 2026
25c0566
[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync …
tjtanaa Apr 11, 2026
eccee21
[Perf] Use global CUDA graph pool for MiMo Audio (#2657)
NickCao Apr 11, 2026
f7e8df9
[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS (#2676)
JuanPZuluaga Apr 11, 2026
6e93595
[CI] [Resource] Remove unused test cases to cutdown agent resources u…
tjtanaa Apr 11, 2026
c20cac8
[Bugfix] Restore user config/runtime stage init timeout (#2519)
yuanheng-zhao Apr 11, 2026
38dfe56
[Bugfix] Validate speaker in chat endpoint and fix case-insensitive l…
reidliu41 Apr 12, 2026
73fb68a
[Docs] Update WeChat QR code for community support (#2701)
david6666666 Apr 12, 2026
5d58abb
[Log] Wire stat loggers into AsyncOmniEngine to match AsyncLLM (#2551)
gcanlin Apr 12, 2026
ef230ac
[Bugfix] Fix Incompatible Multihook Integration (TeaCache <-> CPU Off…
alex-jw-brooks Apr 12, 2026
16041ab
[Refactor] Extend CFG Parallel to support 3 or 4 branch dispatch acro…
zzhuoxin1508 Apr 12, 2026
95b5b2e
[Bugfix] Fix UT for the missing of log_stats in Engine (#2706)
gcanlin Apr 12, 2026
2dce028
[ROCm] [CI] Fix environment issue (#2708)
tjtanaa Apr 12, 2026
eb1a801
[Feat] Override single stage CLI args when stage_configs_path is set …
timzsu Apr 13, 2026
e122501
[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2…
natureofnature Apr 13, 2026
cb4d13a
[Perf][Fish Speech] Enable CUDA Graph capture for Fast AR code predic…
Sy0307 Apr 13, 2026
8097747
[Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path (#…
Celeste-jq Apr 13, 2026
d9e745c
[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API (#2…
linyueqian Apr 13, 2026
2226143
[CI][Bugfix] Refactor the test case to add support for increasing ini…
yenuo26 Apr 13, 2026
2b70e89
[Revert] Revert "[Log] Wire stat loggers into AsyncOmniEngine to matc…
amy-why-3459 Apr 13, 2026
0d4e975
[core]refactor communication layer: PR1(Added Refactor Infra Only) (#…
natureofnature Apr 13, 2026
cd2761e
[Feature]: support Flux.2-dev tea_cache (#1871)
nuclearwu Apr 13, 2026
155583f
[Bugfix] Release stage launch lock before handshake (#2717)
fake0fan Apr 13, 2026
ef3f72b
[Tests][Qwen3-Omni]Modify Qwen3-Omni performance test cases (#2600)
amy-why-3459 Apr 13, 2026
2c67c30
[Bagel]: Support `think mode` in single stage deployment of Bagel (#2…
princepride Apr 13, 2026
e0cdbe9
[Misc] Cleanup: use consistent pytest-mock in unit tests (#2698)
yuanheng-zhao Apr 13, 2026
2a1d506
[skip ci][doc]Update async_chunk design diagram (#2420)
amy-why-3459 Apr 13, 2026
6b5a52a
[Bugfix] Update Flux2-dev & Dynin_omni L4 e2e test (#2723)
wtomin Apr 13, 2026
c9e2e3e
[Voxtral TTS] Correct decode steps param in Voxtral TTS (#2524)
y123456y78 Apr 13, 2026
14f7910
[Perf]: Speedup VoxCPM2 TTS performance and Support PagedAttention (#…
Sy0307 Apr 13, 2026
dd13891
[Voxtral TTS] Fix Voxtral TTS input with text and ref_audio (#2750)
y123456y78 Apr 13, 2026
8d23549
[CI] Qwen image edit performance benckmark (#2216)
fhfuih Apr 14, 2026
a5b38b5
[BugFix] Remove stage_configs_path validation (#2741)
amy-why-3459 Apr 14, 2026
644edac
[Perf] Optimize MP4 encoding latency in video generation (#2735)
SamitHuang Apr 14, 2026
48c30bc
[Qwen3-TTS] Remove hardcoded `distributed_executor_backend` to improv…
iancarrasco-b10 Apr 14, 2026
17acd05
[Test] Add Stable Audio offline e2e TeaCache Test (#2377)
zhangj1an Apr 14, 2026
6d01a8b
[Omni Connector] Omni Transfer Engine Connector: Enable 1-receiver-to…
natureofnature Apr 14, 2026
3229bae
[skip ci] fix docs, gdown remove --id param (#2787)
lengrongfu Apr 14, 2026
159d655
[Tests][Qwen3-Omni]Add test cases for long videos and long audios. (#…
amy-why-3459 Apr 14, 2026
f87674a
[skip ci]add skills (#2710)
hsliuustc0106 Apr 14, 2026
bcd5f16
[Misc] clean Temporary CI Configs (#2784)
n1ptune Apr 14, 2026
5ce0a43
[CI][Bugfix] Update thresholds for accuracy tests (#2725)
yenuo26 Apr 14, 2026
cf1fcd5
[CI/BugFix] Fix Flaky Test for Qwen Omni Perf (#2754)
alex-jw-brooks Apr 14, 2026
4fb078a
[Bugfix] Reject /v1/audio/speech for Qwen omni models (#2763)
scyyh11 Apr 14, 2026
53a9cf4
fix: do not apply FP8 quant config to vision/audio encoders for pre-q…
ianliuy Apr 14, 2026
f03ab38
[BugFix] Fix NoneType' object has no attribute 'detach' (#2797)
amy-why-3459 Apr 14, 2026
bc4a659
[Bugfix] Make mrope kwargs optional in HunyuanImage3 get_mrope_input_…
ianliuy Apr 14, 2026
9e46a79
[Bugfix] Handle numpy array outputs when generate image (#1680)
lengrongfu Apr 15, 2026
02e5dc7
[Perf] VoxCPM2: streaming VAE + compile optimization (45% RTF reducti…
linyueqian Apr 15, 2026
a782ae4
[Perf] Enhance benchmark script to support baseline thresholds and pr…
yenuo26 Apr 15, 2026
227bab3
[Benchmark]Omni-modality model accuracy benchmark(Daily-Omni & seed-t…
amy-why-3459 Apr 15, 2026
0d02073
[CI] qwen image edit L4 accuracy test (#2761)
fhfuih Apr 15, 2026
61a3cbd
[Perf] Eliminate Hop 3 IPC overhead for single-stage diffusion via in…
SamitHuang Apr 15, 2026
6c6551d
[Feature] feat: add video frame interpolation postprocess (#2555)
david6666666 Apr 15, 2026
1ad726f
[Fix] HunyuanImage-3.0: unify naming hunyuan_image_3 → hunyuan_image3…
TaffyOfficial Apr 15, 2026
2dff2d7
[PERF] Wan2.2 support adalayernorm fused op (#2585)
fan2956 Apr 15, 2026
133e2f9
[hotfix] API connection error in CI (#2810)
fhfuih Apr 15, 2026
38d5f2d
[Perf] VoxCPM2: Speedup by manual CUDA Graph capture for scaffold/res…
Sy0307 Apr 15, 2026
4bf4c63
Add voxcpm model support. (#2467)
IsleOfDawnlight Apr 15, 2026
82f8c93
[Feat][Qwen3-Omni] Shared code predictor module for Qwen3-TTS and Qwe…
JuanPZuluaga Apr 15, 2026
50ae1de
[Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage (#2762)
Fishermanykx Apr 15, 2026
c6d76d0
[Bugfix] Fix broken fp8 quantisation on Z-Image-Turbo, Qwen-Image, FL…
zhangj1an Apr 15, 2026
f1e3f03
[feature] Hidden State Prefix Caching (#2164)
alex-jw-brooks Apr 15, 2026
e958113
[Perf] Add Performance Test for Qwen-Image Step-Level Execution (#2707)
wtomin Apr 15, 2026
880a758
[CI] Skip test_thinker_prefix_caching in tests/e2e/online_serving/tes…
yenuo26 Apr 16, 2026
c83f664
[CI][Perf] Add nightly PR labels, consolidate pipeline, and switch be…
yenuo26 Apr 16, 2026
de5f8a2
[Doc][Misc] Update DreamID-Omni Example; Add DreamID-Omni post proces…
yuanheng-zhao Apr 16, 2026
b43c6c6
[Feat] add GLM-Image SP support (#1983)
RuixiangMa Apr 16, 2026
24e61f4
[CI] add qwen image and layered accuracy test (#2772)
david6666666 Apr 16, 2026
4d816ff
[Feature] Bagel: Support tp+cfg parallel using mooncake transfer engi…
natureofnature Apr 16, 2026
f1cb4eb
[PERF] Wan2.2 support rmsnorm fused op (#2583)
fan2956 Apr 16, 2026
e8658b5
[Test] Add performance tests for Qwen-Image-Layered model (#2807)
kechengliu97 Apr 16, 2026
322620f
[Fix][Fish Speech] Remove redundant get_vocab() in control token enco…
Sy0307 Apr 16, 2026
45760d6
[Test] Skip tests for known issues in audio and speaker recognition …
yenuo26 Apr 16, 2026
2ec91d4
[FIX] Preserve YAML default stop words when request sends empty list …
QiuMike Apr 16, 2026
7d64a7c
[BugFix][VoxCPM2]: split multichar Chinese tokens to match training t…
Sy0307 Apr 16, 2026
c3ca5da
Feat/Add HunyuanImage-3.0-Instruct ar part support: (#2713)
TaffyOfficial Apr 16, 2026
817e32d
[Quantization] feat: add FP8 for Omnigen2 (#2441)
zhangj1an Apr 16, 2026
bf64a6b
[Feature] Flux2 klein inpaint (#1180)
RuixiangMa Apr 16, 2026
1e3bb36
[Refactor] Remove sox from dependencies (#2745)
NickCao Apr 17, 2026
3079e94
[Bugfix] enforce max_sequence_length for Qwen-Image and Wan2.2 series…
david6666666 Apr 17, 2026
1237882
[Bugfix] Preserve default diffusion sampling params in default stage …
david6666666 Apr 17, 2026
d463978
[Model] Support Flux1 Schnell (#2528)
alex-jw-brooks Apr 17, 2026
bbd6a44
[Core] Refactor CFG companion tracker and use in Orchestrator (#2623)
yinpeiqi Apr 17, 2026
a3ecde9
[CI][Bugfix] Fix the error in generating the performance data table a…
yenuo26 Apr 17, 2026
b88d3ce
[BugFix] Fixing occasional engine crashes caused by abort requests (#…
amy-why-3459 Apr 17, 2026
a5a4998
[Feature] Support Prefill-Decode disaggregation via vLLM KV transfer …
spencerr221 Apr 17, 2026
c0ccbb8
[Model] Add Ming-flash-omni-2.0 Thinker Stage (#1822)
yuanheng-zhao Apr 17, 2026
b7f2398
[Bugfix] Fix RIFE device selection for CPU-transported videos (#2876)
david6666666 Apr 17, 2026
f658bcb
[Bugfix] Limit Qwen-Image-Edit-2511 input image count (#2840)
david6666666 Apr 17, 2026
edb4f2f
[Test] Add ModelRunner V2 with Qwen3-TTS Base E2E Test to CI pipeline…
tzhouam Apr 17, 2026
cf75ae6
[Bugfix] Fix image quality in /v1/images/generations for multi-stage …
RuixiangMa Apr 17, 2026
6b7be88
Fix NoneType error of outputs (#2315)
QiuMike Apr 17, 2026
18ac679
[Refactor] refactor wan2.2 diffuse && add ut (#2672)
bjf-frz Apr 17, 2026
6c57ab7
[Misc] Warn When vLLM / vLLM-Omni Have Mismatched Versions (#2691)
alex-jw-brooks Apr 17, 2026
536f59b
[Bugfix] Fix cache dit for Longcat & LTX2 (#2860)
alex-jw-brooks Apr 17, 2026
b4add5b
[CI] Skip test_bagel[parallel_tp_2] and test_wan22_i2v_online_serving…
yenuo26 Apr 17, 2026
64d368d
[Bugfix] fix CI failure (#2884)
RuixiangMa Apr 17, 2026
f2edb81
[Cleanup] Remove dead runtime.defaults config parameters (#2343)
NickCao Apr 17, 2026
1637dba
[skip CI][Docs] Add Qwen3-Omni and Qwen3-TTS performance blog and fig…
Shirley125 Apr 17, 2026
b5ddff7
Nextstep online e2e (#2107)
Joshna-Medisetty Apr 17, 2026
f346f2f
Add Teacache Support for LongCat Image (#1487)
alex-jw-brooks Apr 17, 2026
5a68c21
[skip ci][recipe] draft vllm-omni recipes (#2646)
hsliuustc0106 Apr 18, 2026
4f71f73
[Docs] Update WeChat QR code for community support (#2895)
david6666666 Apr 18, 2026
d2c23d7
[Refactor] Remove resampy dependency (#2891)
NickCao Apr 18, 2026
4124a1f
[Feature]Support audio streaming input and output-phase2 (#2581)
Shirley125 Apr 18, 2026
768931e
[BugFix]: Fix multi-stage cfg bug (#2801)
princepride Apr 18, 2026
fe6cec6
[doc][skip ci] remove redundant content in readme (#2901)
Shirley125 Apr 18, 2026
9cf1fe7
[Feat] cache-dit for GLM-Image (#1399)
RuixiangMa Apr 18, 2026
9313f37
[Agent] Add NPU main2main skill (#2858)
gcanlin Apr 18, 2026
a683b1d
[Bugfix][VoxCPM2] Fix voice-clone decode loop by padding prefill prom…
Sy0307 Apr 18, 2026
a390381
[Config Refactor][2/N] Pipeline + Deploy Config Schema (#2383)
lishunyang12 Apr 19, 2026
26edc7f
[Bugfix][VoxCPM2]: Fix vectorized_gather OOB under concurrent prefill…
Sy0307 Apr 19, 2026
1568451
perf(helios): replace strided RoPE with stack+flatten for contiguous …
willamhou Apr 19, 2026
93beef1
[Bugfix] diffusion end points allow model mismatch (#2805)
xiaohajiayou Apr 19, 2026
68f28f9
[Feat] Support layerwise CPU offloading for more videogen models (#2018)
yuanheng-zhao Apr 19, 2026
cd384d9
[Config Refactor 2.5/N] Centralize pipeline registry (#2915)
lishunyang12 Apr 19, 2026
78f237e
[Perf] Optimize Wan2.2 device free on image preprocess (#2852)
fan2956 Apr 20, 2026
d435fe0
[Docs] update documents (#2921)
R2-Y Apr 20, 2026
0393c58
[BugFix] Fixed the issue where --no-async-chunk was not working. (#2934)
amy-why-3459 Apr 20, 2026
b2331a1
Merge upstream/main into moriio-fix
knitcapcat Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 10 additions & 2 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,19 @@ steps:
agents:
queue: "cpu_queue_premerge"

# L4 Test — main+NIGHTLY=1 (scheduled), or PR with label nightly-test (e.g. add label then Rebuild)
# L4 Test — main+NIGHTLY=1 (scheduled), or PR with specific label (e.g. add label then Rebuild)
- label: "Upload Nightly Pipeline"
depends_on: image-build
key: upload-nightly-pipeline
if: '(build.branch == "main" && build.env("NIGHTLY") == "1") || (build.branch != "main" && build.pull_request.labels includes "nightly-test")'
if: >-
(build.branch == "main" && build.env("NIGHTLY") == "1") ||
(build.branch != "main" && (
build.pull_request.labels includes "nightly-test" ||
build.pull_request.labels includes "omni-test" ||
build.pull_request.labels includes "tts-test" ||
build.pull_request.labels includes "diffusion-x2iat-test" ||
build.pull_request.labels includes "diffusion-x2v-test"
))
commands:
- buildkite-agent pipeline upload .buildkite/test-nightly.yml
agents:
Expand Down
118 changes: 62 additions & 56 deletions .buildkite/test-amd-merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ steps:
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export GPU_ARCHS=gfx942
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- |
Expand All @@ -55,28 +54,28 @@ steps:
# - export GPU_ARCHS=gfx942
# - export VLLM_LOGGING_LEVEL=DEBUG
# - export VLLM_WORKER_MULTIPROC_METHOD=spawn
# - timeout 20m pytest -s -v tests/e2e/offline_inference/test_stable_audio_model.py
# - timeout 20m pytest -s -v tests/e2e/offline_inference/test_stable_audio_expansion.py -m "advanced_model and diffusion and L4" --run-level advanced_model

- label: "Diffusion Cache Backend Test"
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export GPU_ARCHS=gfx942
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- timeout 15m pytest -s -v -m "core_model and cache and diffusion and not distributed_cuda and L4"

- label: "Diffusion Sequence Parallelism Test"
agent_pool: mi325_2
- label: "Diffusion Sequence Parallelism Test (Need 4 GPUs)"
agent_pool: mi325_4
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- timeout 20m pytest -s -v tests/e2e/offline_inference/test_sequence_parallel.py
- timeout 20m pytest -s -v tests/diffusion/distributed/test_ulysses_uaa_perf.py

# merge-only tests
- label: "Diffusion Tensor Parallelism Test"
Expand All @@ -95,22 +94,14 @@ steps:
commands:
- timeout 20m pytest -s -v tests/diffusion/test_diffusion_worker.py

- label: "Benchmark & Engine Test"
agent_pool: mi325_2
- label: "Engine Test"
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- |
timeout 20m bash -c '
set +e
pytest -s -v tests/benchmarks/test_serve_cli.py
EXIT1=\$?
pytest -s -v tests/engine/test_async_omni_engine_abort.py
EXIT2=\$?
exit \$((EXIT1 | EXIT2))
'
- timeout 20m pytest -s -v tests/engine/test_async_omni_engine_abort.py

- label: "Omni Model Test Qwen2-5-Omni"
agent_pool: mi325_2
Expand All @@ -121,6 +112,7 @@ steps:
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- timeout 20m pytest -s -v tests/e2e/offline_inference/test_qwen2_5_omni.py
- timeout 20m pytest -s -v tests/e2e/online_serving/test_qwen2_5_omni.py -m "advanced_model" --run-level "advanced_model"

- label: "Omni Model Test Qwen3-Omni"
agent_pool: mi325_2
Expand All @@ -131,11 +123,10 @@ steps:
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- export VLLM_TEST_CLEAN_GPU_MEMORY=1
- timeout 10m pytest -s -v tests/e2e/offline_inference/test_qwen3_omni.py
- timeout 20m pytest -s -v tests/e2e/online_serving/test_qwen3_omni.py -m "advanced_model" --run-level "advanced_model"
- timeout 30m pytest -s -v tests/e2e/offline_inference/test_qwen3_omni.py tests/e2e/online_serving/test_qwen3_omni.py tests/e2e/online_serving/test_mimo_audio.py -m "advanced_model" --run-level "advanced_model"

- label: "Qwen3-TTS CustomVoice E2E Test"
agent_pool: mi325_2
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
Expand All @@ -145,21 +136,21 @@ steps:
export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_ALLOW_LONG_MAX_MODEL_LEN="1"
pytest -s -v tests/e2e/online_serving/test_qwen3_tts_customvoice.py -m "advanced_model" --run-level "advanced_model" && pytest -s -v tests/e2e/offline_inference/test_qwen3_tts_customvoice.py
pytest -s -v tests/e2e/online_serving/test_qwen3_tts_customvoice.py tests/e2e/offline_inference/test_qwen3_tts_customvoice.py -m "advanced_model" --run-level "advanced_model"
'

- label: "Qwen3-TTS Base E2E Test"
agent_pool: mi325_2
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- |
timeout 20m bash -c '
timeout 30m bash -c '
export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export VLLM_ALLOW_LONG_MAX_MODEL_LEN="1"
pytest -s -v tests/e2e/online_serving/test_qwen3_tts_base.py -m "advanced_model" --run-level "advanced_model" && pytest -s -v tests/e2e/offline_inference/test_qwen3_tts_base.py
pytest -s -v tests/e2e/online_serving/test_qwen3_tts_base.py tests/e2e/offline_inference/test_qwen3_tts_base.py -m "advanced_model" --run-level "advanced_model"
'

- label: "Diffusion Image Edit Test"
Expand All @@ -173,43 +164,58 @@ steps:
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- timeout 20m pytest -s -v tests/e2e/online_serving/test_image_gen_edit.py

# split Bagel Model Test with H100 (Real Weights) into three tests
- label: "Bagel Text2Img Model Test"
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export GPU_ARCHS=gfx942
- export VLLM_TEST_CLEAN_GPU_MEMORY=1
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- export VLLM_ROCM_USE_AITER_RMSNORM=0
- timeout 30m pytest -s -v tests/e2e/offline_inference/test_bagel_text2img.py -m "advanced_model" --run-level "advanced_model" -k "shared_memory" -k "rocm"
# TODO: Bagel test on ROCm is very unstable. @tjtanaa
# Need to debug before reneable numerical changes across large PRs
# # split Bagel Model Test with H100 (Real Weights) into three tests
# - label: "Bagel Text2Img Model Test (1/3)"
# agent_pool: mi325_1
# depends_on: amd-build
# mirror_hardwares: [amdproduction]
# grade: Blocking
# commands:
# - export GPU_ARCHS=gfx942
# - export VLLM_TEST_CLEAN_GPU_MEMORY=1
# - export VLLM_LOGGING_LEVEL=DEBUG
# - export VLLM_WORKER_MULTIPROC_METHOD=spawn
# - export VLLM_ROCM_USE_AITER_RMSNORM=0
# - timeout 30m pytest -s -v tests/e2e/offline_inference/test_bagel_text2img.py -m "advanced_model" --run-level "advanced_model" -k "shared_memory" -k "rocm"

- label: "Bagel Img2Img Model Test"
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export GPU_ARCHS=gfx942
- export VLLM_TEST_CLEAN_GPU_MEMORY=1
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- export VLLM_ROCM_USE_AITER_RMSNORM=0
- timeout 30m pytest -s -v tests/e2e/offline_inference/test_bagel_img2img.py -m "advanced_model" --run-level "advanced_model" -k "rocm"
# - label: "Bagel Img2Img Model Test (2/3)"
# agent_pool: mi325_1
# depends_on: amd-build
# mirror_hardwares: [amdproduction]
# grade: Blocking
# commands:
# - export GPU_ARCHS=gfx942
# - export VLLM_TEST_CLEAN_GPU_MEMORY=1
# - export VLLM_LOGGING_LEVEL=DEBUG
# - export VLLM_WORKER_MULTIPROC_METHOD=spawn
# - export VLLM_ROCM_USE_AITER_RMSNORM=0
# - timeout 30m pytest -s -v tests/e2e/offline_inference/test_bagel_img2img.py -m "advanced_model" --run-level "advanced_model" -k "rocm"

# - label: "Bagel Online Serving Test (3/3)"
# agent_pool: mi325_1
# depends_on: amd-build
# mirror_hardwares: [amdproduction]
# grade: Blocking
# commands:
# - export GPU_ARCHS=gfx942
# - export VLLM_TEST_CLEAN_GPU_MEMORY=1
# - export VLLM_IMAGE_FETCH_TIMEOUT=60
# - export VLLM_LOGGING_LEVEL=DEBUG
# - export VLLM_WORKER_MULTIPROC_METHOD=spawn
# - export VLLM_ROCM_USE_AITER_RMSNORM=0
# - timeout 40m pytest -s -v tests/e2e/online_serving/test_bagel_online.py -m "advanced_model" --run-level "advanced_model" -k "rocm"

- label: "Bagel Online Serving Test"
- label: "Voxtral-TTS E2E Test"
agent_pool: mi325_1
depends_on: amd-build
mirror_hardwares: [amdproduction]
grade: Blocking
commands:
- export GPU_ARCHS=gfx942
- export VLLM_TEST_CLEAN_GPU_MEMORY=1
- export VLLM_IMAGE_FETCH_TIMEOUT=60
- export VLLM_LOGGING_LEVEL=DEBUG
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
- export VLLM_ROCM_USE_AITER_RMSNORM=0
- timeout 40m pytest -s -v tests/e2e/online_serving/test_bagel_online.py -m "advanced_model" --run-level "advanced_model" -k "rocm"
- |
timeout 20m bash -c '
export VLLM_LOGGING_LEVEL=DEBUG
export VLLM_WORKER_MULTIPROC_METHOD=spawn
pytest -s -v tests/e2e/online_serving/test_voxtral_tts.py tests/e2e/offline_inference/test_voxtral_tts.py -m "advanced_model" --run-level "advanced_model"
'
Loading