Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
204 commits
Select commit Hold shift + click to select a range
642f169
[Bugfix] Update Whisper model loading to support multi-GPU configurat…
yenuo26 Mar 31, 2026
f8d0bf5
[release] Add nightly wheel release index (#2345)
khluu Mar 31, 2026
369f301
[BugFix] Add BAGEL single-stage diffusion config and fix multiple `<i…
princepride Apr 1, 2026
dd0b6fd
[Bugfix] Fix layer-wise offload incompatibility with HSDP (#2021)
RuixiangMa Apr 1, 2026
7274e15
[BugFix] qwen3_tts chunk boundary handling logic in initial chunk (IC…
Fattysand Apr 1, 2026
7b965a7
[Feat][Benchmark] Add synchronous video generation endpoint POST /v1/…
SamitHuang Apr 1, 2026
183775e
[Docs] Update WeChat QR code for community support (#2402)
david6666666 Apr 1, 2026
080593c
[CI] [skip ci]Nightly Report Optim (#2406)
congw729 Apr 1, 2026
c3376a4
[Feature][HunyuanImage3.0] Add cfgP to HunyuanImage3.0 (#1751)
nussejzz Apr 1, 2026
08cb436
Fix: ensure input tensor is contiguous in GroupCoordinator.all_gather…
daixinning Apr 1, 2026
d40840b
[Perf] Bagel KV-ready early forwarding and time step consistency for …
natureofnature Apr 1, 2026
3fd4a4d
[Feat] Support step-boundary abort in diffusion (#1769)
asukaqaq-s Apr 1, 2026
bf5bd0a
[BugFix]: Fix bagel single-stage img2img fallback to text2img bug (#2…
princepride Apr 1, 2026
3def008
[Feat] Add MUSA platform support for Moore Threads GPUs (#2337)
yeahdongcn Apr 1, 2026
6ef0e90
Add new committers to governance page (#2419)
ywang96 Apr 1, 2026
4e4bbc4
[CI] Tune GPU resources for test (#2401)
tjtanaa Apr 1, 2026
70a6265
[Feat] support HSDP for Qwen-image series, Z-Image, GLM-Image (#2029)
RuixiangMa Apr 1, 2026
bbae904
[Bugfix] Fix delayed decoding bug for Bagel AR/DIT workflow (L3 test_…
natureofnature Apr 1, 2026
9595be5
[skip ci][Doc] Update RFC template doc (#2141)
yuanheng-zhao Apr 2, 2026
9c2a576
[Test] Add voice or language test case for Qwen3-omni and Qwen-tts (#…
yenuo26 Apr 2, 2026
ebc9a8d
[skip ci][Doc] Small fix of Doc (#2400)
wtomin Apr 2, 2026
d3daafb
[Feat] Add benchmarks for Qwen3-TTS Base/VoiceDesign Model (#2411)
JasonJ2021 Apr 2, 2026
900f6aa
[CI] [skip ci] Rename & reset timout mins for nightly L4 tests. (#2251)
congw729 Apr 2, 2026
c1d2dcc
[AutoRound] Add offline quantized `W4A16` model support (#1777)
yiliu30 Apr 2, 2026
e2892ef
[Perf] Optimize Wan2.2 rotary embedding (#2393)
gcanlin Apr 2, 2026
458f402
Add VACE support for WAN 2.1 conditional video generation (#1885)
tangbinh Apr 2, 2026
ca02351
[skip ci][Bugfix] clean useless log (#2450)
R2-Y Apr 2, 2026
50bb47a
[Test] Skip tests/e2e/online_serving/test_zimage_expansion.py due to …
zhumingjue138 Apr 2, 2026
728cf6d
[Feature] add session based audio streaming input (#2208)
Shirley125 Apr 2, 2026
6211413
Update MRoPE config fallback logic (#2278)
vraiti Apr 2, 2026
6be5d05
[Docs] Update docs to use vllm-ascend v0.18.0rc1 (#2453)
gcanlin Apr 3, 2026
fa275fd
[BAGEL] [Feature]: Add `thinking mode` in Bagel multi-stage serving (…
princepride Apr 3, 2026
7fb86d5
[BugFix][FishSpeech] Fix structured voice clone prefill conditioning …
Sy0307 Apr 3, 2026
563f73b
Refactor StageDiffusionClient and StageEngineCoreClient (#2006)
chickeyton Apr 3, 2026
6dc61c9
[Perf] Skip Wan2.2 cross attn Ulysses SP (#2459)
gcanlin Apr 3, 2026
cd71567
[Model] Add two stages inference for model LTX-2 distilled. (#2260)
Songrui625 Apr 3, 2026
515d15e
[Cleanup] Replace bare print() with logger and use specific exception…
Lidang-Jiang Apr 3, 2026
10db95f
[Bugfix] Fix Flux2 Dev Guidance (#2433)
alex-jw-brooks Apr 3, 2026
0e83ebe
[OmniVoice] Add two-stage TTS serving support (#2463)
linyueqian Apr 3, 2026
f50c5a4
[Qwen3TTS] [TTS] [Feat] Refactor voice cache manager (#2108)
JuanPZuluaga Apr 3, 2026
4c03158
[CosyVoice3] Add online serving support, fix stage config, and add CI…
linyueqian Apr 4, 2026
2804a85
[Rebase] Rebase to vllm v0.19.0 (#2475)
tzhouam Apr 4, 2026
191b9a8
Voxtral TTS: drop hardcoded CUDA in audio tokenizer; add XPU stage co…
Joshna-Medisetty Apr 4, 2026
0059ec8
[Model Support]: Magihuman support (#2301)
princepride Apr 4, 2026
f2227d3
[Docs] Update WeChat QR code for community support (#2481)
david6666666 Apr 4, 2026
0cf3819
[CI] Fix missing queue for Voxtral-TTS E2E test step (#2484)
linyueqian Apr 4, 2026
d92439c
[CosyVoice3] Fix vLLM 0.19.0 compatibility issues (#2486)
linyueqian Apr 5, 2026
6fc38e0
[Model][Core] Enable async_chunk streaming pipeline for CosyVoice3 (#…
indevn Apr 5, 2026
094907e
[Chore] Fix Bagel model import compatibility (#2491)
yuanheng-zhao Apr 5, 2026
0824ede
ci: remove CosyVoice3 post-merge test (#2492)
linyueqian Apr 5, 2026
832952b
[Feat] add diffusion pipeline profiler and progress bar support to Fl…
RuixiangMa Apr 5, 2026
dd9ca6f
[Bugfix] Include uv.lock in .gitignore (#2493)
timzsu Apr 5, 2026
88f7ed9
[Bugfix] Assign original prompt back to RequestOutput (#2498)
yuanheng-zhao Apr 5, 2026
b2b2ab0
[CI/Build] Add Dockerfile.cuda for NVIDIA GPU users [Skip-CI] (#1439)
loveysuby Apr 5, 2026
025408f
[Fix] [Qwen3-TTS] Qwen3-TTS streaming chunk-boundary artifacts (#2480)
Sy0307 Apr 5, 2026
f6cfacd
[Perf][Qwen3-TTS] Free unused decoder in Talker SpeechTokenizer to VR…
Sy0307 Apr 5, 2026
8b57c62
[Perf][Fish Speech] Free unused DAC codec components to save VRAM (#2…
Sy0307 Apr 5, 2026
e23b263
fix(qwen3_tts): align code predictor buffer dtype with model paramete…
willamhou Apr 5, 2026
328de58
[Feat] support for multi-block layerwise offloading, fix top-level pa…
RuixiangMa Apr 6, 2026
486d77d
[Feature] Enable LoRA adapter injection for BAGEL (#2490)
timzsu Apr 6, 2026
e771842
[Feature] Support vae tiling parallel encode (#2368)
gcanlin Apr 6, 2026
54e964d
[Bugfix] Fix load_weights fallback for non-fused stacked_params_mappi…
timzsu Apr 6, 2026
5b2c4f9
[BugFix] Add bagel text2text/img2text think mode support (#2503)
princepride Apr 7, 2026
8dd66ce
[BugFix] Continue decode if don't need transfer kv cache between two …
princepride Apr 7, 2026
93a3fcf
[CI] Add doc-only change detection to skip Buildkite CI. (#1284)
congw729 Apr 7, 2026
368de99
[Test] Test whether CI can be correctly skipped when the committed fi…
yenuo26 Apr 7, 2026
7a72f34
Add supports_float64() to OmniPlatform and clean up MPS (#2488)
yeahdongcn Apr 7, 2026
08e2e1f
[Bugfix] Fix DataType Handling in Default Diffusion Config (#2530)
alex-jw-brooks Apr 7, 2026
0304c97
[Docs] Add installation guide for Moore Threads (MUSA) GPUs (#2359)
yeahdongcn Apr 7, 2026
5d4c9ec
[bugfix]bugfix dreamid (#2125)
erfgss Apr 7, 2026
badbe8e
[RFC] Offload blocking TTS/speech ops to thread pool to unblock event…
scyyh11 Apr 7, 2026
0998b30
[Bugfix] To resolve timeout error, update nightly test commands for d…
yenuo26 Apr 7, 2026
9584dd6
[HunyuanImage3] Align system_prompt support with official implementat…
skf-1999 Apr 7, 2026
340cba7
[daVinci-MagiHuman][Doc][BugFix] Update model support for daVici-Magi…
princepride Apr 7, 2026
408365f
[Bagel]Fused gate_proj and up_proj (#2546)
princepride Apr 7, 2026
feefdae
[Bugfix] Accept 'speaker' as alias for 'voice' in TTS speech API (#2424)
marksverdhei Apr 7, 2026
c9dbc09
[Bugfix] Prevent Silent Stage Dropouts: fix coordinator reconnect bug…
pikaxinge Apr 7, 2026
bc5e945
[release] Fix release script (#2566)
khluu Apr 7, 2026
b246617
[release] Fix lint issue (#2567)
khluu Apr 8, 2026
8a55d3d
[Feat] Enable Layerwise CPU offloading for SD3.5, Ovis-Image, Nextste…
yuanheng-zhao Apr 8, 2026
6433847
[skipCI][Docs] Add expert_parallel.md (#2471)
skf-1999 Apr 8, 2026
cb6a873
[Feature] Add trajectory recording to BAGEL denoising loop (#2483)
timzsu Apr 8, 2026
ec082ad
[Perf] Wan2.2 I2V optimization: convert datatype from FP32 to BF16 in…
Fishermanykx Apr 8, 2026
1cd5210
[Diffusion] Refactor LTX2 to use unified CFG parallel framework (#2160)
TKONIY Apr 8, 2026
8609bc8
[Feat] image2image for Z-Image (#1580)
RuixiangMa Apr 8, 2026
c3c736d
[Feature] Port Bagel RDMA flow to latest main (#2000)
ahengljh Apr 8, 2026
fb3c6bd
[Feat] Add MUSA flash attention support via mate package (#2451)
yeahdongcn Apr 8, 2026
aefa2ee
[Fix] Align diffusion proc test mock with current output fields (#2584)
ahengljh Apr 8, 2026
7e7efdd
[Bugfix] Fix benchmark Total input tokens for multimodal requests (#2…
Dnoob Apr 8, 2026
fcda835
[Unit Test] Add unit tests for orchestrator (#2096)
yinpeiqi Apr 8, 2026
2c6c07c
[TTS] Add missing _generate_pcm_chunks for OmniOpenAIServingSpeech st…
vveerrgg Apr 8, 2026
149b9f1
[Perf][Qwen3-TTS][Voxtral-TTS] Share CUDA graph memory pool across de…
NickCao Apr 9, 2026
c3f1042
[Feature] End-to-end LoRA support for BAGEL (#2494)
timzsu Apr 9, 2026
e6f88f7
[CI] Reorganize the L1 L2 use cases and add markers (#2449)
zhumingjue138 Apr 9, 2026
3bd8a52
[Bugfix] Enforce --max-generated-image-size on /v1/images/generations…
NickCao Apr 9, 2026
0edc356
[CI]Refactor nightly test configuration in Buildkite, Add group for O…
yenuo26 Apr 9, 2026
ed7a448
[Bugfix] Guard app.state access during server shutdown (#2587)
pjh4993 Apr 9, 2026
9d87229
[MagiHuman] Fix audio sample rate and fps propagation for online serv…
princepride Apr 9, 2026
92c788e
[Misc] Clean up method name in BAGEL. (#2501)
timzsu Apr 9, 2026
0e8e630
[Feat] /v1/images/generations api supports request cancel (#2621)
Semmer2 Apr 9, 2026
9225039
[Bug] Lazy-import entrypoints to fix subprocess pynvml crash (#2187)
RGB-loop Apr 9, 2026
a7bf405
[Docs] Add multi-thread weight loading documentation (#2445)
SamitHuang Apr 9, 2026
e2b0ee4
[Model] Add Dynin-omni model in vllm-omni (#1759)
DOGEUNNKIM Apr 9, 2026
2d98013
[Bugfix] Fix precedence between caller runtime args and default stage…
xiaohajiayou Apr 9, 2026
d2aa9cf
Revert "[Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#1982…
ZeldaHuang Apr 9, 2026
956f53b
[Refactor] Use trajectory_* fields for Qwen-Image structured RL outpu…
SamitHuang Apr 9, 2026
85d63c4
[Bugfix] Fix Qwen-Image min-size normalization for tiny requests (#2637)
david6666666 Apr 9, 2026
694be6f
[Bugfix] Fix Fish Speech voice clone FileNotFoundError on multi-GPU (…
Sy0307 Apr 9, 2026
4b6d929
[CI][Bugfix] Update environment variables for test configurations in …
yenuo26 Apr 9, 2026
0c46ba5
[Bugfix] restore legacy stage config precedence (#2663)
xiaohajiayou Apr 10, 2026
9423243
[Feat][FishSpeech] Cache DAC-encoded ref audio for voice cloning (#2609)
linyueqian Apr 10, 2026
86985ed
[CI] Update merge condition in upload_pipeline_with_skip_ci.sh to inc…
yenuo26 Apr 10, 2026
f3f2dc5
[Feature]: support Flux.2-dev CFG-Parallel (#2010)
nuclearwu Apr 10, 2026
cb91cbe
[Entrypoint][Refactor]Stage CLI Refactor (#2020)
wuhang2014 Apr 10, 2026
c1da480
[CI] Update merge condition in upload_pipeline_with_skip_ci.sh to inc…
yenuo26 Apr 10, 2026
c2ae58b
[Bugfix] fix mindiesd laserattention unsupported error (#2673)
fan2956 Apr 10, 2026
fbb5dd5
[Bugfix]: modify diffusion pipeline profiler result in videos (#2647)
bjf-frz Apr 10, 2026
78bef62
[Profiler] Add Nsight Systems support for serving (#1098)
ahengljh Apr 10, 2026
687405c
[Config] Remove invalid LLM-only engine_args from diffusion stage con…
ianliuy Apr 10, 2026
2bc183f
[Refactor] Remove dependency on librosa (#2273)
NickCao Apr 10, 2026
a41174e
[Model] VoxCPM2 native AR TTS support (#2658)
linyueqian Apr 11, 2026
001f2e3
[BUG FIX]: prevent EngineCore crash when Qwen TTS Base task is missin…
teith Apr 11, 2026
d1fef41
[Doc] Add LTX-2 online serving deployment recipes with optimization b…
SamitHuang Apr 11, 2026
c9e8411
[feature] : add cache-dit for stable-audio-open-1.0 (#1341)
akshatvishu Apr 11, 2026
25c0566
[ROCm] [CI] [Bugfix] Resurface CI Signal, fix MHA AR selection, sync …
tjtanaa Apr 11, 2026
eccee21
[Perf] Use global CUDA graph pool for MiMo Audio (#2657)
NickCao Apr 11, 2026
f7e8df9
[TTS][OmniVoice] Add voice cloning support for OmniVoice TTS (#2676)
JuanPZuluaga Apr 11, 2026
6e93595
[CI] [Resource] Remove unused test cases to cutdown agent resources u…
tjtanaa Apr 11, 2026
c20cac8
[Bugfix] Restore user config/runtime stage init timeout (#2519)
yuanheng-zhao Apr 11, 2026
38dfe56
[Bugfix] Validate speaker in chat endpoint and fix case-insensitive l…
reidliu41 Apr 12, 2026
73fb68a
[Docs] Update WeChat QR code for community support (#2701)
david6666666 Apr 12, 2026
5d58abb
[Log] Wire stat loggers into AsyncOmniEngine to match AsyncLLM (#2551)
gcanlin Apr 12, 2026
ef230ac
[Bugfix] Fix Incompatible Multihook Integration (TeaCache <-> CPU Off…
alex-jw-brooks Apr 12, 2026
16041ab
[Refactor] Extend CFG Parallel to support 3 or 4 branch dispatch acro…
zzhuoxin1508 Apr 12, 2026
95b5b2e
[Bugfix] Fix UT for the missing of log_stats in Engine (#2706)
gcanlin Apr 12, 2026
2dce028
[ROCm] [CI] Fix environment issue (#2708)
tjtanaa Apr 12, 2026
eb1a801
[Feat] Override single stage CLI args when stage_configs_path is set …
timzsu Apr 13, 2026
e122501
[Bugfix] Fix Bagel online mode for 1. Hang after several requests 2…
natureofnature Apr 13, 2026
cb4d13a
[Perf][Fish Speech] Enable CUDA Graph capture for Fast AR code predic…
Sy0307 Apr 13, 2026
8097747
[Model] Adapt Wan2.2-I2V-A14B via LightX2V offline conversion path (#…
Celeste-jq Apr 13, 2026
d9e745c
[Fix] VoxCPM2: support raw audio for voice cloning via OpenAI API (#2…
linyueqian Apr 13, 2026
2226143
[CI][Bugfix] Refactor the test case to add support for increasing ini…
yenuo26 Apr 13, 2026
2b70e89
[Revert] Revert "[Log] Wire stat loggers into AsyncOmniEngine to matc…
amy-why-3459 Apr 13, 2026
0d4e975
[core]refactor communication layer: PR1(Added Refactor Infra Only) (#…
natureofnature Apr 13, 2026
cd2761e
[Feature]: support Flux.2-dev tea_cache (#1871)
nuclearwu Apr 13, 2026
155583f
[Bugfix] Release stage launch lock before handshake (#2717)
fake0fan Apr 13, 2026
ef3f72b
[Tests][Qwen3-Omni]Modify Qwen3-Omni performance test cases (#2600)
amy-why-3459 Apr 13, 2026
2c67c30
[Bagel]: Support `think mode` in single stage deployment of Bagel (#2…
princepride Apr 13, 2026
e0cdbe9
[Misc] Cleanup: use consistent pytest-mock in unit tests (#2698)
yuanheng-zhao Apr 13, 2026
2a1d506
[skip ci][doc]Update async_chunk design diagram (#2420)
amy-why-3459 Apr 13, 2026
6b5a52a
[Bugfix] Update Flux2-dev & Dynin_omni L4 e2e test (#2723)
wtomin Apr 13, 2026
c9e2e3e
[Voxtral TTS] Correct decode steps param in Voxtral TTS (#2524)
y123456y78 Apr 13, 2026
14f7910
[Perf]: Speedup VoxCPM2 TTS performance and Support PagedAttention (#…
Sy0307 Apr 13, 2026
dd13891
[Voxtral TTS] Fix Voxtral TTS input with text and ref_audio (#2750)
y123456y78 Apr 13, 2026
8d23549
[CI] Qwen image edit performance benckmark (#2216)
fhfuih Apr 14, 2026
a5b38b5
[BugFix] Remove stage_configs_path validation (#2741)
amy-why-3459 Apr 14, 2026
644edac
[Perf] Optimize MP4 encoding latency in video generation (#2735)
SamitHuang Apr 14, 2026
48c30bc
[Qwen3-TTS] Remove hardcoded `distributed_executor_backend` to improv…
iancarrasco-b10 Apr 14, 2026
17acd05
[Test] Add Stable Audio offline e2e TeaCache Test (#2377)
zhangj1an Apr 14, 2026
6d01a8b
[Omni Connector] Omni Transfer Engine Connector: Enable 1-receiver-to…
natureofnature Apr 14, 2026
3229bae
[skip ci] fix docs, gdown remove --id param (#2787)
lengrongfu Apr 14, 2026
159d655
[Tests][Qwen3-Omni]Add test cases for long videos and long audios. (#…
amy-why-3459 Apr 14, 2026
f87674a
[skip ci]add skills (#2710)
hsliuustc0106 Apr 14, 2026
bcd5f16
[Misc] clean Temporary CI Configs (#2784)
n1ptune Apr 14, 2026
5ce0a43
[CI][Bugfix] Update thresholds for accuracy tests (#2725)
yenuo26 Apr 14, 2026
cf1fcd5
[CI/BugFix] Fix Flaky Test for Qwen Omni Perf (#2754)
alex-jw-brooks Apr 14, 2026
4fb078a
[Bugfix] Reject /v1/audio/speech for Qwen omni models (#2763)
scyyh11 Apr 14, 2026
53a9cf4
fix: do not apply FP8 quant config to vision/audio encoders for pre-q…
ianliuy Apr 14, 2026
f03ab38
[BugFix] Fix NoneType' object has no attribute 'detach' (#2797)
amy-why-3459 Apr 14, 2026
bc4a659
[Bugfix] Make mrope kwargs optional in HunyuanImage3 get_mrope_input_…
ianliuy Apr 14, 2026
9e46a79
[Bugfix] Handle numpy array outputs when generate image (#1680)
lengrongfu Apr 15, 2026
02e5dc7
[Perf] VoxCPM2: streaming VAE + compile optimization (45% RTF reducti…
linyueqian Apr 15, 2026
a782ae4
[Perf] Enhance benchmark script to support baseline thresholds and pr…
yenuo26 Apr 15, 2026
227bab3
[Benchmark]Omni-modality model accuracy benchmark(Daily-Omni & seed-t…
amy-why-3459 Apr 15, 2026
0d02073
[CI] qwen image edit L4 accuracy test (#2761)
fhfuih Apr 15, 2026
61a3cbd
[Perf] Eliminate Hop 3 IPC overhead for single-stage diffusion via in…
SamitHuang Apr 15, 2026
6c6551d
[Feature] feat: add video frame interpolation postprocess (#2555)
david6666666 Apr 15, 2026
1ad726f
[Fix] HunyuanImage-3.0: unify naming hunyuan_image_3 → hunyuan_image3…
TaffyOfficial Apr 15, 2026
2dff2d7
[PERF] Wan2.2 support adalayernorm fused op (#2585)
fan2956 Apr 15, 2026
133e2f9
[hotfix] API connection error in CI (#2810)
fhfuih Apr 15, 2026
38d5f2d
[Perf] VoxCPM2: Speedup by manual CUDA Graph capture for scaffold/res…
Sy0307 Apr 15, 2026
4bf4c63
Add voxcpm model support. (#2467)
IsleOfDawnlight Apr 15, 2026
82f8c93
[Feat][Qwen3-Omni] Shared code predictor module for Qwen3-TTS and Qwe…
JuanPZuluaga Apr 15, 2026
50ae1de
[Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage (#2762)
Fishermanykx Apr 15, 2026
c6d76d0
[Bugfix] Fix broken fp8 quantisation on Z-Image-Turbo, Qwen-Image, FL…
zhangj1an Apr 15, 2026
f1e3f03
[feature] Hidden State Prefix Caching (#2164)
alex-jw-brooks Apr 15, 2026
e958113
[Perf] Add Performance Test for Qwen-Image Step-Level Execution (#2707)
wtomin Apr 15, 2026
880a758
[CI] Skip test_thinker_prefix_caching in tests/e2e/online_serving/tes…
yenuo26 Apr 16, 2026
c83f664
[CI][Perf] Add nightly PR labels, consolidate pipeline, and switch be…
yenuo26 Apr 16, 2026
de5f8a2
[Doc][Misc] Update DreamID-Omni Example; Add DreamID-Omni post proces…
yuanheng-zhao Apr 16, 2026
b43c6c6
[Feat] add GLM-Image SP support (#1983)
RuixiangMa Apr 16, 2026
24e61f4
[CI] add qwen image and layered accuracy test (#2772)
david6666666 Apr 16, 2026
4d816ff
[Feature] Bagel: Support tp+cfg parallel using mooncake transfer engi…
natureofnature Apr 16, 2026
f1cb4eb
[PERF] Wan2.2 support rmsnorm fused op (#2583)
fan2956 Apr 16, 2026
e8658b5
[Test] Add performance tests for Qwen-Image-Layered model (#2807)
kechengliu97 Apr 16, 2026
322620f
[Fix][Fish Speech] Remove redundant get_vocab() in control token enco…
Sy0307 Apr 16, 2026
45760d6
[Test] Skip tests for known issues in audio and speaker recognition …
yenuo26 Apr 16, 2026
2ec91d4
[FIX] Preserve YAML default stop words when request sends empty list …
QiuMike Apr 16, 2026
7d64a7c
[BugFix][VoxCPM2]: split multichar Chinese tokens to match training t…
Sy0307 Apr 16, 2026
c3ca5da
Feat/Add HunyuanImage-3.0-Instruct ar part support: (#2713)
TaffyOfficial Apr 16, 2026
817e32d
[Quantization] feat: add FP8 for Omnigen2 (#2441)
zhangj1an Apr 16, 2026
7231338
Add Claude code review workflow
lishunyang12 Apr 16, 2026
a5f46d7
Fix claude-review trigger condition
lishunyang12 Apr 16, 2026
b721f10
Add id-token permission for claude-code-action
lishunyang12 Apr 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
20 changes: 20 additions & 0 deletions .buildkite/nightly-release-pipeline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
steps:
- label: "Build and upload wheel"
key: "build-wheel"
agents:
queue: cpu_queue_release
commands:
- "curl -LsSf https://astral.sh/uv/install.sh | sh"
- 'export PATH="$HOME/.local/bin:$PATH"'
- "uv venv --python=3.12 && source .venv/bin/activate"
- "uv pip install --upgrade build"
- "python3 -m build"
- "bash .buildkite/scripts/upload-nightly-wheels.sh"

- label: "Generate and upload wheel indices"
depends_on: "build-wheel"
allow_dependency_failure: true
agents:
queue: small_cpu_queue_release
commands:
- "bash .buildkite/scripts/generate-and-upload-nightly-index.sh"
2 changes: 1 addition & 1 deletion .buildkite/pipeline-intel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ steps:
DOCKER_BUILDKIT: "1"
# Buildkite will automatically replace this with the actual commit hash
VLLM_IMAGE_TAG: "${BUILDKITE_COMMIT}"
VLLM_VERSION: "v0.18.0"
VLLM_VERSION: "v0.19.0"
priority: 100
timeout_in_minutes: 60
soft_fail: true
31 changes: 27 additions & 4 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
# Document 1: Buildkite loads only this block on first parse. The next step resolves docs-only skip-ci
# from git diff, then uploads document 2. When docs-only skip applies, image-build still runs if nightly-test
# / main NIGHTLY so upload-nightly is not skipped together with test-ready/test-merge.
#
# Document 2: appended after `---`; same file, read by upload_pipeline_with_skip_ci.sh (not evaluated as a second pipeline by Buildkite).
steps:
- label: ":github: Resolve skip-ci & upload pipeline"
key: upload-ci-pipeline
commands:
- "bash .buildkite/scripts/upload_pipeline_with_skip_ci.sh"
agents:
queue: "cpu_queue_premerge"

---
steps:
- label: ":docker: Build image"
key: image-build
if: __IMAGE_BUILD_IF__
commands:
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "docker build --progress=plain --file docker/Dockerfile.ci -t vllm-omni-ci ."
Expand All @@ -13,7 +28,7 @@ steps:
- label: "Upload Ready Pipeline"
depends_on: image-build
key: upload-ready-pipeline
if: build.branch != "main" && build.pull_request.labels includes "ready"
if: __UPLOAD_READY_IF__
commands:
- buildkite-agent pipeline upload .buildkite/test-ready.yml
agents:
Expand All @@ -23,17 +38,25 @@ steps:
- label: "Upload Merge Pipeline"
depends_on: image-build
key: upload-merge-pipeline
if: build.branch == "main" && build.env("NIGHTLY") != "1"
if: __UPLOAD_MERGE_IF__
commands:
- buildkite-agent pipeline upload .buildkite/test-merge.yml
agents:
queue: "cpu_queue_premerge"

# L4 Test — main+NIGHTLY=1 (scheduled), or PR with label nightly-test (e.g. add label then Rebuild)
# L4 Test — main+NIGHTLY=1 (scheduled), or PR with specific label (e.g. add label then Rebuild)
- label: "Upload Nightly Pipeline"
depends_on: image-build
key: upload-nightly-pipeline
if: '(build.branch == "main" && build.env("NIGHTLY") == "1") || (build.branch != "main" && build.pull_request.labels includes "nightly-test")'
if: >-
(build.branch == "main" && build.env("NIGHTLY") == "1") ||
(build.branch != "main" && (
build.pull_request.labels includes "nightly-test" ||
build.pull_request.labels includes "omni-test" ||
build.pull_request.labels includes "tts-test" ||
build.pull_request.labels includes "diffusion-x2iat-test" ||
build.pull_request.labels includes "diffusion-x2v-test"
))
commands:
- buildkite-agent pipeline upload .buildkite/test-nightly.yml
agents:
Expand Down
88 changes: 88 additions & 0 deletions .buildkite/scripts/generate-and-upload-nightly-index.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/usr/bin/env bash

set -ex

# Generate and upload wheel indices for all vllm-omni wheels in the commit directory.
# This script should run once after all wheels have been built and uploaded.
# All paths are under the omni/ prefix in the vllm-wheels S3 bucket.

# ======== setup ========

BUCKET="vllm-wheels"
INDICES_OUTPUT_DIR="indices"
PYTHON="${PYTHON_PROG:-python3}"
SUBPATH="omni/$BUILDKITE_COMMIT"
S3_COMMIT_PREFIX="s3://$BUCKET/$SUBPATH/"

# detect if python3.12+ is available
has_new_python=$($PYTHON -c "print(1 if __import__('sys').version_info >= (3,12) else 0)")
if [[ "$has_new_python" -eq 0 ]]; then
# use new python from docker
docker pull python:3-slim
PYTHON="docker run --rm --user $(id -u):$(id -g) -v $(pwd):/app -w /app python:3-slim python3"
fi

echo "Using python interpreter: $PYTHON"
echo "Python version: $($PYTHON --version)"

# ======== generate and upload indices ========

# list all wheels in the commit directory
echo "Existing wheels on S3:"
aws s3 ls "$S3_COMMIT_PREFIX" || echo "(no objects found)"
obj_json="objects.json"
aws s3api list-objects-v2 --bucket "$BUCKET" --prefix "$SUBPATH/" --delimiter / --output json > "$obj_json"
mkdir -p "$INDICES_OUTPUT_DIR"

# HACK: we do not need regex module here, but it is required by pre-commit hook
# To avoid any external dependency, we simply replace it back to the stdlib re module
sed -i.bak 's/import regex as re/import re/g' .buildkite/scripts/generate-nightly-index.py && rm -f .buildkite/scripts/generate-nightly-index.py.bak

# Generate indices -- the version is just the commit hash (not omni/{commit})
# because relative paths are computed between the index and wheel directories,
# both of which live under the omni/ prefix in S3.
$PYTHON .buildkite/scripts/generate-nightly-index.py \
--version "$BUILDKITE_COMMIT" \
--current-objects "$obj_json" \
--output-dir "$INDICES_OUTPUT_DIR" \
--comment "commit $BUILDKITE_COMMIT"

# copy indices to /omni/{commit}/ unconditionally
echo "Uploading indices to $S3_COMMIT_PREFIX"
aws s3 cp --recursive "$INDICES_OUTPUT_DIR/" "$S3_COMMIT_PREFIX"

# copy to /omni/nightly/ when NIGHTLY=1
if [[ "${NIGHTLY:-}" == "1" ]]; then
echo "Uploading indices to overwrite /omni/nightly/"
aws s3 cp --recursive "$INDICES_OUTPUT_DIR/" "s3://$BUCKET/omni/nightly/"
fi

# detect version from any wheel in the commit directory
first_wheel_key=$($PYTHON -c "import json; obj=json.load(open('$obj_json')); print(next((c['Key'] for c in obj.get('Contents', []) if c['Key'].endswith('.whl')), ''))")
if [[ -z "$first_wheel_key" ]]; then
echo "Error: No wheels found in $S3_COMMIT_PREFIX"
exit 1
fi
first_wheel=$(basename "$first_wheel_key")
aws s3 cp "s3://$BUCKET/${first_wheel_key}" "/tmp/${first_wheel}"
version=$(unzip -p "/tmp/${first_wheel}" '**/METADATA' | grep '^Version: ' | cut -d' ' -f2)
rm -f "/tmp/${first_wheel}"
echo "Version in wheel: $version"
pure_version="${version%%+*}"
echo "Pure version (without variant): $pure_version"

# re-generate and copy to /omni/{version}/ only if it does not have "dev" in the version
if [[ "$version" != *"dev"* ]]; then
s3_version="v$pure_version"
echo "Re-generating indices for /omni/$s3_version/"
rm -rf "${INDICES_OUTPUT_DIR:?}"
mkdir -p "$INDICES_OUTPUT_DIR"
# wheel-dir is overridden to be the commit directory, so that the indices point to the correct wheel path
$PYTHON .buildkite/scripts/generate-nightly-index.py \
--version "$s3_version" \
--wheel-dir "$BUILDKITE_COMMIT" \
--current-objects "$obj_json" \
--output-dir "$INDICES_OUTPUT_DIR" \
--comment "version $pure_version"
aws s3 cp --recursive "$INDICES_OUTPUT_DIR/" "s3://$BUCKET/omni/$s3_version/"
fi
192 changes: 192 additions & 0 deletions .buildkite/scripts/generate-nightly-index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project

import argparse
import json
import re
import sys
from dataclasses import asdict, dataclass
from datetime import datetime
from pathlib import Path
from typing import Any
from urllib.parse import quote


def normalize_package_name(name: str) -> str:
"""Normalize package name per PEP 503."""
return re.sub(r"[-_.]+", "-", name).lower()


if not sys.version_info >= (3, 12):
raise RuntimeError("This script requires Python 3.12 or higher.")

INDEX_HTML_TEMPLATE = """<!DOCTYPE html>
<html>
<!-- {comment} -->
<meta name="pypi:repository-version" content="1.0">
<body>
{items}
</body>
</html>
"""


@dataclass
class WheelFileInfo:
package_name: str
version: str
build_tag: str | None
python_tag: str
abi_tag: str
platform_tag: str
filename: str


def parse_from_filename(file: str) -> WheelFileInfo:
"""
Parse wheel filename per PEP 427:
{package_name}-{version}(-{build_tag})?-{python_tag}-{abi_tag}-{platform_tag}.whl
"""
wheel_file_re = re.compile(
r"^(?P<package_name>.+)-(?P<version>[^-]+?)(-(?P<build_tag>[^-]+))?-(?P<python_tag>[^-]+)-(?P<abi_tag>[^-]+)-(?P<platform_tag>[^-]+)\.whl$"
)
match = wheel_file_re.match(file)
if not match:
raise ValueError(f"Invalid wheel file name: {file}")

return WheelFileInfo(
package_name=match.group("package_name"),
version=match.group("version"),
build_tag=match.group("build_tag"),
python_tag=match.group("python_tag"),
abi_tag=match.group("abi_tag"),
platform_tag=match.group("platform_tag"),
filename=file,
)


def generate_project_list(package_names: list[str], comment: str = "") -> str:
"""Generate top-level PEP 503 project list HTML."""
href_tags = []
for name in sorted(package_names):
href_tags.append(f' <a href="{name}/">{name}/</a><br/>')
return INDEX_HTML_TEMPLATE.format(items="\n".join(href_tags), comment=comment)


def generate_package_index(
wheel_files: list[WheelFileInfo],
wheel_base_dir: Path,
index_base_dir: Path,
comment: str = "",
) -> tuple[str, str]:
"""Generate package index HTML and metadata JSON linking to wheel files."""
href_tags = []
metadata = []
for file in sorted(wheel_files, key=lambda x: x.filename):
relative_path = wheel_base_dir.relative_to(index_base_dir, walk_up=True) / file.filename
# handle '+' in URL; avoid double-encoding '/' and '%2B' (AWS S3 behavior)
file_path_quoted = quote(relative_path.as_posix(), safe=":%/")
href_tags.append(f' <a href="{file_path_quoted}">{file.filename}</a><br/>')
file_meta = asdict(file)
file_meta["path"] = file_path_quoted
metadata.append(file_meta)
index_str = INDEX_HTML_TEMPLATE.format(items="\n".join(href_tags), comment=comment)
metadata_str = json.dumps(metadata, indent=2)
return index_str, metadata_str


def generate_index(
whl_files: list[str],
wheel_base_dir: Path,
index_base_dir: Path,
comment: str = "",
):
"""
Generate PEP 503 index for all wheel files.

Output structure:
index_base_dir/
index.html # project list linking to vllm-omni/
vllm-omni/
index.html # package index linking to wheel files
metadata.json # machine-readable metadata
"""
parsed_files = [parse_from_filename(f) for f in whl_files]

if not parsed_files:
print("No wheel files found, skipping index generation.")
return

comment_str = f" ({comment})" if comment else ""
comment_tmpl = f"Generated on {datetime.now().isoformat()}{comment_str}"

# Group by normalized package name
packages: dict[str, list[WheelFileInfo]] = {}
for file in parsed_files:
name = normalize_package_name(file.package_name)
packages.setdefault(name, []).append(file)

print(f"Found packages: {list(packages.keys())}")

# Generate per-package index
for package, files in packages.items():
package_dir = index_base_dir / package
package_dir.mkdir(parents=True, exist_ok=True)
index_str, metadata_str = generate_package_index(files, wheel_base_dir, package_dir, comment)
with open(package_dir / "index.html", "w") as f:
f.write(index_str)
with open(package_dir / "metadata.json", "w") as f:
f.write(metadata_str)

# Generate top-level project list
project_list_str = generate_project_list(sorted(packages.keys()), comment_tmpl)
with open(index_base_dir / "index.html", "w") as f:
f.write(project_list_str)


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Generate PEP 503 wheel index from S3 object listing.")
parser.add_argument("--version", type=str, required=True, help="Version string (e.g., commit hash)")
parser.add_argument("--current-objects", type=str, required=True, help="Path to JSON from S3 list-objects-v2")
parser.add_argument("--output-dir", type=str, required=True, help="Directory to write index files")
parser.add_argument("--wheel-dir", type=str, default=None, help="Wheel directory (defaults to --version)")
parser.add_argument("--comment", type=str, default="", help="Comment for generated HTML")

args = parser.parse_args()

version = args.version
if "\\" in version or "/" in version:
raise ValueError("Version string must not contain slashes or backslashes.")

output_dir = Path(args.output_dir)
output_dir.mkdir(parents=True, exist_ok=True)

with open(args.current_objects) as f:
current_objects: dict[str, list[dict[str, Any]]] = json.load(f)

wheel_files = [
item["Key"].split("/")[-1] for item in current_objects.get("Contents", []) if item["Key"].endswith(".whl")
]

print(f"Found {len(wheel_files)} wheel files for version {version}: {wheel_files}")

# For release versions, filter to only matching non-dev wheels
PY_VERSION_RE = re.compile(r"^\d+\.\d+\.\d+([a-zA-Z0-9.+-]*)?$")
if PY_VERSION_RE.match(version):
wheel_files = [f for f in wheel_files if version in f and "dev" not in f]
print(f"Non-nightly version detected, wheel files used: {wheel_files}")
else:
print("Nightly version detected, keeping all wheel files.")

wheel_dir = (args.wheel_dir or version).strip().rstrip("/")
wheel_base_dir = Path(output_dir).parent / wheel_dir
index_base_dir = Path(output_dir)

generate_index(
whl_files=wheel_files,
wheel_base_dir=wheel_base_dir,
index_base_dir=index_base_dir,
comment=args.comment.strip(),
)
print(f"Successfully generated index in {output_dir}")
Loading
Loading