Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1287 commits
Select commit Hold shift + click to select a range
5ae638c
Adding deterministic lora benchmarking to vLLM Bench (#36057)
RonaldBXu Mar 18, 2026
1d5ed78
Add API docs link if the CLI arg is a config class (#37432)
hmellor Mar 18, 2026
9496288
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec…
orozery Mar 18, 2026
7df6065
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E through…
yewentao256 Mar 18, 2026
3fff62b
[Misc] Clean up model registry (#37457)
DarkLight1337 Mar 18, 2026
d68954f
[Model] Remove unnecessary processor definition for Nemotron Parse (#…
DarkLight1337 Mar 18, 2026
7606842
[bugfix][async scheduling] fix extra cuda context in device 0 with EP…
youkaichao Mar 18, 2026
d047b0e
[Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp …
cnyvfang Mar 18, 2026
bf62ca1
Fix models which use `layer_type_validation` for Transformers v5 (#37…
hmellor Mar 18, 2026
b6691c7
[Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795) (#…
JartX Mar 18, 2026
bd75fdd
chunk parakeet into 30s clips to prevent OOMs on long audios (#36671)
netanel-haber Mar 18, 2026
de69384
[V0 Deprecation] Deprecate virtual engine (#37195)
yewentao256 Mar 18, 2026
7c48273
fix(worker): optimize swap_states to copy only active token prefixes …
pjo256 Mar 18, 2026
f67b81c
[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928)
WoosukKwon Mar 18, 2026
8ea3b29
[Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465)
mgoin Mar 18, 2026
a35a825
[Model Runner V2] Spec decode rejection sampler greedy support (#37238)
TheEpicDolphin Mar 18, 2026
7734ddb
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache…
andylolu2 Mar 18, 2026
7857e30
[Bugfix] Expand quantization method support in perf metrics (#37231)
thillai-c Mar 18, 2026
60f9b46
[XPU]Unify xpu test dependencies in dockerfile.xpu (#36477)
1643661061leo Mar 19, 2026
12918ee
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from C…
elvircrn Mar 19, 2026
45e72ea
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
SageMoore Mar 19, 2026
105e79e
[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync ex…
hao-aaron Mar 19, 2026
33f1588
[Model Runner V2] Spec decode rejection sampler logprobs support (#37…
TheEpicDolphin Mar 19, 2026
6ce502d
[bug] Fix deadlock with pause resume and collective_rpc (#37024)
hao-aaron Mar 19, 2026
a2c303f
[Perf] Optimize token_embed for pooling models, 1.0% token throughput…
yewentao256 Mar 19, 2026
cde956d
[ROCm] issue management - request information for bug issues on ROCm …
hongxiayang Mar 19, 2026
e8f5624
[Refactor] Relocate endpoint tests to mirror serving code directory s…
sfeng33 Mar 19, 2026
4fc7ed7
[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310)
ZhanqiuHu Mar 19, 2026
8f82769
fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic str…
cdpath Mar 19, 2026
37dec84
[Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425)
ZeldaHuang Mar 19, 2026
c3f4051
Support temporal compression for Nemotron-3-VL videos (#36808)
collinmccarthy Mar 19, 2026
4582837
[CI] Fix wrong path test file, missing `rlhf_async_new_apis.py` (#37532)
tjtanaa Mar 19, 2026
9d4f162
[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_av…
jikunshang Mar 19, 2026
f18494b
fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369)
yassha Mar 19, 2026
1ac2582
[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_…
Duyi-Wang Mar 19, 2026
81d2f2d
[Bugfix] Fix Nemotron Parse loading (#37407)
DarkLight1337 Mar 19, 2026
eface9c
[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile …
bigPYJ1151 Mar 19, 2026
a7ce132
Don't log `exc_info` when vLLM tries to doenload a file that doesn't …
hmellor Mar 19, 2026
2b9126d
[Docs] Reorganize pooling docs. (#35592)
noooop Mar 19, 2026
e98c6ff
[CI/Build] Split out MM pooling tests (#37542)
DarkLight1337 Mar 19, 2026
e0c7759
[Model] Remove unnecessary `get_language_model` (#37545)
DarkLight1337 Mar 19, 2026
3fca01f
Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu…
xueliangyang-oeuler Mar 19, 2026
be8cb39
[CI] Merge `cleanup_pr_body.yml` and `reminder_comment.yml` (#37552)
hmellor Mar 19, 2026
571f498
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id…
DorBernsohn Mar 19, 2026
d9fd057
[Misc] Clean up processing logic (#37541)
DarkLight1337 Mar 19, 2026
8bd9957
Stop bench CLI from recursively casting all configs to `dict` (#37559)
hmellor Mar 19, 2026
65d7c98
Cap the number of API servers to 1 when using Elastic EP. (#37466)
SageMoore Mar 19, 2026
6b5eb81
[LoRA] Minor improvements to LoRA log (#37557)
jeejeelee Mar 19, 2026
8a2c186
Remove deprecated reasoning_content message field(part-2) (#37480)
ikaadil Mar 19, 2026
b644f06
[1/n] Migrate permute_cols to libtorch stable ABI (#31509)
mikaylagawarecki Mar 19, 2026
e1290d0
[MRV2] Use fp32 for draft logits (#37526)
WoosukKwon Mar 19, 2026
029f8f2
[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#3…
wzhao18 Mar 19, 2026
42e0fb3
[Misc] Cleanup more configs and processors (#37560)
DarkLight1337 Mar 19, 2026
5e502de
Run MacOS smoke test on daily `cron` job instead of every commit (#37…
hmellor Mar 19, 2026
e2018d5
[CI] Gate pre-commit on `ready` label or number of contributions (#37…
hmellor Mar 19, 2026
4784531
[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37…
fadara01 Mar 19, 2026
0c86721
[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation …
chaunceyjiang Mar 19, 2026
d8b5dbd
[torch.compile][BE][Multimodal] Remove requirement to set_model_tag t…
Lucaskabela Mar 19, 2026
5713e22
[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM i…
EdalatiAli Mar 19, 2026
ad10f0c
Fix `SpeculatorsConfig` now that `PreTrainedConfig` is a `dataclass` …
hmellor Mar 19, 2026
eff661e
[CI] Add retry with 4x backoff to HTTP fetches for transient failures…
AndreasKaratzas Mar 19, 2026
d494a59
[Log] Log once in local node by default (#37568)
yewentao256 Mar 19, 2026
9c757ab
[MoE Refactor] DefaultMoERunner simplifcation (#33049)
bnellnm Mar 19, 2026
35af51c
[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
AndreasKaratzas Mar 19, 2026
da0a6e9
Comment fix for async rl example (#35244)
hao-aaron Mar 19, 2026
2f237e9
[MoE Refactor] Rename "naive" all2all backend (#36294)
bnellnm Mar 19, 2026
a06ca92
test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064)
laithsakka Mar 19, 2026
5362831
[Performance] Enable Triton autotuning disk cache by default (#37188)
arpera Mar 19, 2026
95f8fe0
[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_…
rasmith Mar 19, 2026
18778ff
Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448)
jhsmith409 Mar 19, 2026
1040ee7
[Refactor] Remove dead code in pooling model (#37572)
yewentao256 Mar 19, 2026
6817ead
[Bug] Fix EmbedIOprocessor "classify" <-> "embed" (#37573)
yewentao256 Mar 19, 2026
2b119b2
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056)
sfeng33 Mar 19, 2026
0252f0a
[ROCm][Bugfix] fix cache block size mismatch for aiter unified attent…
divakar-amd Mar 20, 2026
c961d01
Fix DP coordinator ZMQ port TOCTOU (#37452)
itayalroy Mar 20, 2026
5b1a6c1
[CI] Update mergify tool-calling label paths (#37478)
sfeng33 Mar 20, 2026
829e97e
fix: disambiguate multimodal prefix cache keys (#36708)
tianshu-Michael-yu Mar 20, 2026
ee33b58
[Feat] Enable CompressedTensorW4A8Int for XPU (#37207)
tianmu-li Mar 20, 2026
42cbc56
[compile][graph_partition]Add tensor size handling (#36038)
fxdawnn Mar 20, 2026
71eaeb2
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
jeejeelee Mar 20, 2026
2b6f6a6
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
sfeng33 Mar 20, 2026
10d120a
fix(xpu): Re-compute compile ranges after platform-specific config up…
Liangyx2 Mar 20, 2026
d0f6b52
[Model Runner V2] fix draft attention metadata generation (#37364)
TheEpicDolphin Mar 20, 2026
339ca53
[XPU] Automatically detect target platform as XPU in build. (#37634)
ccrhx4 Mar 20, 2026
c56ebb4
[Refactor] Relocate entrypoint tests to match serving code structure …
sfeng33 Mar 20, 2026
ed6b19b
[Model] Refactor Step3-VL processor to HF style (#37579)
DarkLight1337 Mar 20, 2026
cb7275e
[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293)
Wangbei25 Mar 20, 2026
b8a69c2
[XPU] bump vllm-xpu-kernels to v0.1.4 (#37641)
jikunshang Mar 20, 2026
c7c1d52
[Bug] Fix FlashInfer allreduce fusion workspace uninitialized error (…
wzhao18 Mar 20, 2026
6c310a3
[CI] Removing deprecated rlhf examples reference (#37585)
AndreasKaratzas Mar 20, 2026
6043223
[Model Runner V2] Fix draft logits not populated during cudagraph rep…
TheEpicDolphin Mar 20, 2026
23dc6f8
[Model] Deprecate the score task (this will not affect users). (#37537)
noooop Mar 20, 2026
5895abd
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list …
AndreasKaratzas Mar 20, 2026
d42e763
[ROCm][CI] Remove deepep DBO tests on gfx90a (#37614)
AndreasKaratzas Mar 20, 2026
2a6d28a
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible…
AndreasKaratzas Mar 20, 2026
eca5c2e
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/…
sfeng33 Mar 20, 2026
82da88f
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypo…
sfeng33 Mar 20, 2026
711f0ed
[Misc] Use logger.info_once for auto tool choice log message (#37661)
chaunceyjiang Mar 20, 2026
aed18ab
[UX] Enable torch_profiler_with_stack (#37571)
jeejeelee Mar 20, 2026
b77639c
Fix attribute error in `isaac_patch_hf_runner` (#37685)
hmellor Mar 20, 2026
7aa1c49
[Model] Add LFM2-ColBERT-350M support (#37528)
ieBoytsov Mar 20, 2026
773dff2
[ROCm][Quantization] make quark ocp mx dtype parser robust for weight…
xuebwang-amd Mar 20, 2026
052ddda
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34…
laudney Mar 20, 2026
a3f28e0
[Bugfix] Reject channelwise quantization (group_size <= 0) in Exllama…
mgehre-amd Mar 20, 2026
cadc887
[Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-…
mgehre-amd Mar 20, 2026
35cbc7e
[Pixtral] Enable Pixtral language model support Eagle3 (#37182)
Flechman Mar 20, 2026
57797b9
[compile] Fix aot test failures with torch 2.12. (#37604)
zhxchen17 Mar 20, 2026
5498087
[Metrics] Some small refactoring for better maintainability (#33898)
hickeyma Mar 20, 2026
9d78f45
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFAC…
zhxchen17 Mar 20, 2026
39bee87
Fix various config related issues for Transformers v5 (#37681)
hmellor Mar 20, 2026
1104d26
Add prefill RoPE + KV cache fusion for MLA
khairulkabir1661 Mar 20, 2026
34ce09f
Remove commented-out debug logging statements
khairulkabir1661 Mar 24, 2026
ad903df
Rename _run_atom_fused_decode to _run_aiter_fused_decode
khairulkabir1661 Mar 24, 2026
93892ce
Fix: Set _fused_prefill_kernel for both FP4 and FP8 paths
khairulkabir1661 Mar 24, 2026
ff56d57
Remove VLLM_USE_AITER_PREFILL_FUSED flag - always use prefill fusion
khairulkabir1661 Mar 24, 2026
d939a77
Remove useless else block and slot_mapping check in prefill fusion
khairulkabir1661 Mar 24, 2026
1194461
Update outdated comments in prefill fusion code
khairulkabir1661 Mar 24, 2026
3c1ddeb
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
AndreasKaratzas Mar 20, 2026
7bd05ae
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_ke…
xyang16 Mar 20, 2026
05023af
[MRV2] Avoid recompilation of _gather_block_tables_kernel (#37645)
WoosukKwon Mar 20, 2026
a16c7d7
fix CUDAGraph memory being counted twice (#37426)
panpan0000 Mar 20, 2026
6242a8d
[Attention] Support distinguishing between short extends and decodes …
LucasWilkinson Mar 20, 2026
f5f8d1f
refactor: abstract deepgemm support into platform (#37519)
SherryC41 Mar 20, 2026
b896986
[Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention…
Young-Leo Mar 20, 2026
89a86a8
[Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693)
DarkLight1337 Mar 20, 2026
2d9b002
[compile] Initialize passes at VllmBackend init (#35216)
angelayi Mar 20, 2026
b3a6170
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#3759…
vadiklyutiy Mar 20, 2026
fdbc3b2
[ROCm][CI] Setting some mi325_4 tests back to optional (in parity wit…
AndreasKaratzas Mar 20, 2026
2e8dd5c
[Model Runner V2] Support Streaming Inputs (#37028)
santiramos27 Mar 20, 2026
75695eb
[Refactor] Remove unused dead code (#36171)
yewentao256 Mar 20, 2026
370e22b
[BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks …
kjiang249 Mar 20, 2026
fdc986f
elastic_ep: Fix issues with repeated scale up/down cycles (#37131)
itayalroy Mar 20, 2026
cb91d91
Add get_device_uuid for rocm (#37694)
tmm77 Mar 21, 2026
1fd9096
[Frontend] Remove librosa from audio dependency (#37058)
Isotr0py Mar 21, 2026
65dc873
[MoE Refactor] Mxfp4 oracle rebased (#37128)
zyongye Mar 21, 2026
0b37381
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collec…
AndreasKaratzas Mar 21, 2026
f26b6b0
Revert "[compile] Initialize passes at VllmBackend init" (#37733)
simon-mo Mar 21, 2026
c8c9640
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
AndreasKaratzas Mar 21, 2026
61f9480
[Responses API] Add kv_transfer_params for PD disaggregation (#37424)
bongwoobak Mar 21, 2026
d424e71
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list …
AndreasKaratzas Mar 21, 2026
88fb4be
[ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. (#34692)
lcskrishna Mar 21, 2026
b50712a
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create()…
fuscof-ibm Mar 21, 2026
1564b3f
[Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) (#…
mmangkad Mar 21, 2026
2f58ec7
[Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning (#37756)
mmangkad Mar 21, 2026
63f68bd
Add tensor IPC transfer mechanism for multimodal data (#32104)
brandonpelfrey Mar 21, 2026
7f5aaea
Consolidate AWQ quantization into single awq_marlin.py file
robertgshaw2-redhat Mar 21, 2026
70402ea
Revert "Consolidate AWQ quantization into single awq_marlin.py file" …
robertgshaw2-redhat Mar 21, 2026
45b3a41
[Quantization][Deprecation] Remove PTPC FP8 (#32700)
robertgshaw2-redhat Mar 21, 2026
0c05411
[MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759)
robertgshaw2-redhat Mar 21, 2026
aa86d17
[ROCm][CI] close missing quote in kernels/moe block in run-amd-test.s…
AndreasKaratzas Mar 22, 2026
f29ea49
[Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing…
AndreasKaratzas Mar 22, 2026
6b669c7
[ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (#37717)
AndreasKaratzas Mar 22, 2026
df26f87
[CI] Skip ISAAC multimodal tests due to broken upstream HF model weig…
AndreasKaratzas Mar 22, 2026
6a715d3
[Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fal…
AndreasKaratzas Mar 22, 2026
f15d6b1
[Perf] Optimize glm4.xv VIT (#37779)
KKSK-DON Mar 22, 2026
deb410d
[ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks…
AndreasKaratzas Mar 22, 2026
eea035a
[ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#…
AndreasKaratzas Mar 22, 2026
e7f9412
[ROCm][CI] Added missing resampy dependency for MM audio tests (#37778)
AndreasKaratzas Mar 22, 2026
5be8424
[ROCm][CI] Make some duplicated tests optional so that they are only …
AndreasKaratzas Mar 22, 2026
4df693b
[MoE] Move PF Methods to Folder (#35927)
robertgshaw2-redhat Mar 22, 2026
aedbbf0
[ROCm][CI] Stabilize ROCm speech-to-text translation test with lower …
AndreasKaratzas Mar 22, 2026
1d689f3
[Model Runner V2] Support multi-modal embeddings for spec decode mode…
TheEpicDolphin Mar 22, 2026
ebb768b
[Bug] Fix fp8 deepgemm batch invariant (#37718)
yewentao256 Mar 22, 2026
1651dc7
[Test] Only Run MLA model when user explicitly set for batch invarian…
yewentao256 Mar 22, 2026
3e29ae5
Enable `NemotronHPuzzle` + `NemotronHMTP` (#37803)
netanel-haber Mar 22, 2026
7c28c8d
[MRV2] Skip hidden states allocation for PW CUDA graphs (#37818)
WoosukKwon Mar 22, 2026
362c784
[Bigfix]fix lora test by pass padded size back to the layer (#37811)
zyongye Mar 22, 2026
4709d6e
[MRV2] Use FP64 for Gumbel noise (#37798)
WoosukKwon Mar 22, 2026
fa4a2b6
[Model Runner V2] Enable piecewise CUDA graphs for pipeline paralleli…
ZhanqiuHu Mar 22, 2026
932f6b5
[MRV2] Enable PP CUDA graph test (#37830)
WoosukKwon Mar 22, 2026
b668312
Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
lashahub Mar 23, 2026
1071e0f
always use `embed&token_classify` for bge-m3 (#37632)
staugust Mar 23, 2026
6381a60
[Test] Consolidate tool parser unit tests to tests/tool_parsers (#37834)
bbrowning Mar 23, 2026
7219594
[CI/Build][LoRA] Update Qwen35 LoRA testing (#37816)
jeejeelee Mar 23, 2026
ce1fd85
[Feature] ViT Full CUDA Graph (#35963)
b-mu Mar 23, 2026
bff9926
update doc for online fp8 quantization (#37851)
yma11 Mar 23, 2026
c1ccebb
[ROCm][Refactor] Enable AWQMarlinConfig on ROCm to use choose_mp_line…
mgehre-amd Mar 23, 2026
9b73a81
[Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' …
r266-tech Mar 23, 2026
56d4796
[Bugfix] Store Qwen3Next A_log in fp32 (#37810)
effortprogrammer Mar 23, 2026
64a0fde
[Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 (#37338)
arpera Mar 23, 2026
ce7bd8a
[ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs (#3…
ChuanLi1101 Mar 23, 2026
0fedfb8
[Misc]Update gitignore (#37863)
wangxiyuan Mar 23, 2026
0a2f1d1
[FP8]add FP8 WoQ kernel abstraction. (#32929)
jikunshang Mar 23, 2026
fd6ed5e
[Frontend][Responses API] Fix arrival_time recording for TTFT on init…
qandrew Mar 23, 2026
b1e9c94
[XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (#37784)
jikunshang Mar 23, 2026
549abf0
[Bugfix] Fix CPU backend crash in KV cache block zeroing (#37550)
DorBernsohn Mar 23, 2026
877a94d
[Bugfix][LoRA] Fix incorrect LoRA Log (#37877)
jeejeelee Mar 23, 2026
7246a3e
[ROCm] fix sleep mode not releasing GPU memory problem on ROCm (#37533)
aaab8b Mar 23, 2026
dc14ca1
[Mypy] Fix mypy for `vllm/config` (#37808)
yewentao256 Mar 23, 2026
3c719bf
[Bugfix] RoBERTa position_id accumulation in CUDA graph padding regio…
yanghui1-arch Mar 23, 2026
543f8ae
[Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding …
he-yufeng Mar 23, 2026
5bd7ef4
Use lazy graph module during split_module to defer recompile() (#37609)
angelayi Mar 23, 2026
7ad13db
[CI][PD] Add Hybrid SSM integration tests to CI (#37657)
NickLucche Mar 23, 2026
a27ca30
[CI] split Entrypoints Integration (API Server 1) into 3 jobs (#37882)
jikunshang Mar 23, 2026
31a59ef
[MRV2] Consider spec decoding in warmup (#37812)
WoosukKwon Mar 23, 2026
f4cf9d1
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (…
MatthewBonanni Mar 23, 2026
b96f2e4
[Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (…
kylesayrs Mar 23, 2026
f13c263
[Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision (#36725)
robertgshaw2-redhat Mar 23, 2026
1095443
[Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM…
yzong-rh Mar 23, 2026
2aeeb7e
[Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unk…
WindChimeRan Mar 23, 2026
31bcaf8
[CI] Split V1 Others into 3 separate jobs (#37016)
khluu Mar 23, 2026
b00fcf3
[Test] E2E Nemotron-3-Super tests (#36803)
roikoren755 Mar 24, 2026
82ddc58
[Model Runner V2] Gather multimodal embeddings before draft model pos…
TheEpicDolphin Mar 24, 2026
9a26544
[CI] Add batch invariant test: Block FP8 + small MOE (#37895)
yewentao256 Mar 24, 2026
2d96006
[ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs (…
AndreasKaratzas Mar 24, 2026
f8b7f26
[V0 Deprecation] Refactor kv cache from list to element (#37487)
yewentao256 Mar 24, 2026
a6ae504
Downsize CPU jobs to use small queue (#37913)
khluu Mar 24, 2026
133c9ac
[release] Move agent queue to Release cluster queues (#37783)
khluu Mar 24, 2026
98ce193
[Frontend][Bugfix] Pass default_chat_template_kwargs to AnthropicServ…
jetxa Mar 24, 2026
2e237a2
[KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove B…
ronensc Mar 24, 2026
4ff3774
Fix tool_parser_cls type annotation from Callable to type[ToolParser]…
sfeng33 Mar 24, 2026
188b76a
[Docs] Fix build (#37991)
hmellor Mar 24, 2026
65a5195
[EPLB] Remove main waits in case of slow EPLB (#36271)
ilmarkov Mar 24, 2026
20fa06f
[Bugfix] Suppress spurious CPU KV cache warning in `launch render` (#…
sagearc Mar 24, 2026
404c267
[Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987)
bigPYJ1151 Mar 24, 2026
5d57802
[Deprecate] Deprecate pooling multi task support. (#37956)
noooop Mar 24, 2026
1e443fa
Update new contributor message (#37999)
hmellor Mar 24, 2026
98a2dec
[Feature] limit thinking tokens (hard limit) (#20859)
llsj14 Mar 24, 2026
07165ab
[Core] add option to schedule requests based on full ISL (#37307)
DanBlanaru Mar 24, 2026
af3a594
[Mypy] Fix mypy for `vllm/model_executor` (except `vllm/model_executo…
hmellor Mar 24, 2026
810bd72
[XPU] Support Intel XPU hardware information collection in usage stat…
1643661061leo Mar 24, 2026
3cd3f9c
[Bugfix] Force continuous usage stats when CLI override is enabled (#…
dsingal0 Mar 24, 2026
e909077
Fix Mamba state corruption from referencing stale block table entries…
minosfuture Mar 24, 2026
14fdba2
docs: fix broken offline inference paths in documentation (#37998)
vineetatiwari27 Mar 24, 2026
b523a51
[Bugfix] Fix structured output crash on CPU due to pin_memory=True (#…
wjhrdy Mar 24, 2026
817d676
[Model] Add Granite 4.0 1B speech to supported models (#38019)
NickCao Mar 24, 2026
b1b380f
[BugFix] Fix order of compile logging (#38012)
zou3519 Mar 24, 2026
1053702
[BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015)
zou3519 Mar 24, 2026
8fbbb35
[Bugfix] Pass hf_token through config loading paths for gated model s…
javierdejesusda Mar 24, 2026
3e3f7ee
[FlexAttention] allow custom mask mod (#37692)
liangel-02 Mar 24, 2026
89c7920
Add Ubuntu 24.04 support for Docker builds (#35386)
aasgaonkar Mar 24, 2026
ab1a479
[Model Runner V2][Minor] Simplify PP logic (#38031)
njhill Mar 24, 2026
a3620b3
[MRV2] Fix for DS v3.2 (#38030)
WoosukKwon Mar 24, 2026
ad6f941
[UX] Add flashinfer-cubin as CUDA default dep (#37233)
mgoin Mar 24, 2026
178f361
Make microbatch optimization (DBO) work with general models (#37926)
0xjunhao Mar 24, 2026
1020dd5
Remove unused rope_applied parameter
khairulkabir1661 Mar 26, 2026
4536d90
Restore upstream MLACommonImpl fixes unrelated to AITER
khairulkabir1661 Mar 26, 2026
b8286e2
Restore upstream changes in backend/metadata sections (after line 1508)
khairulkabir1661 Mar 26, 2026
6bc234a
Clean up comments in unified_mla_attention_with_output
khairulkabir1661 Mar 26, 2026
8e1033d
Clean up comments in unified_mla_kv_cache_update
khairulkabir1661 Mar 26, 2026
6c1ca2b
Simplify unified_mla_kv_cache_update docstring
khairulkabir1661 Mar 26, 2026
f37ec12
Clean up comments in unified_mla_attention (lines 1248-1289)
khairulkabir1661 Mar 26, 2026
5ac35c5
Clean up comments in decode path (lines 832-892)
khairulkabir1661 Mar 26, 2026
4a9f6fd
Clean up comments in prefill/decode paths (lines 728-817)
khairulkabir1661 Mar 26, 2026
3e29061
Clean up forward_impl signature comments (lines 654-671)
khairulkabir1661 Mar 26, 2026
4ccaf67
Clean up custom ops path comments (lines 602-612)
khairulkabir1661 Mar 26, 2026
42024bd
Clean up forward signature and context comments (lines 551-567)
khairulkabir1661 Mar 26, 2026
6ba2d98
Clean up AITER kernel init comments (lines 452-525)
khairulkabir1661 Mar 26, 2026
eb7db05
Clean up __init__ parameter comments (lines 305-325)
khairulkabir1661 Mar 26, 2026
f007215
Clean up _run_aiter_fused_decode comments (lines 920-1006)
khairulkabir1661 Mar 26, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .buildkite/hardware_tests/amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ steps:
docker build
--build-arg max_jobs=16
--build-arg REMOTE_VLLM=1
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942'
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942;gfx950'
--build-arg VLLM_BRANCH=$BUILDKITE_COMMIT
--tag "rocm/vllm-ci:${BUILDKITE_COMMIT}"
-f docker/Dockerfile.rocm
Expand Down
19 changes: 13 additions & 6 deletions .buildkite/hardware_tests/cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ depends_on: []
steps:
- label: CPU-Kernel Tests
depends_on: []
soft_fail: true
device: intel_cpu
no_plugin: true
source_file_dependencies:
Expand All @@ -21,9 +20,21 @@ steps:
pytest -x -v -s tests/kernels/moe/test_cpu_fused_moe.py
pytest -x -v -s tests/kernels/test_onednn.py"

- label: CPU-Compatibility Tests
depends_on: []
device: intel_cpu
no_plugin: true
source_file_dependencies:
- cmake/cpu_extension.cmake
- setup.py
- vllm/platforms/cpu.py
commands:
- |
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
bash .buildkite/scripts/hardware_ci/run-cpu-compatibility-test.sh"

- label: CPU-Language Generation and Pooling Model Tests
depends_on: []
soft_fail: true
device: intel_cpu
no_plugin: true
source_file_dependencies:
Expand All @@ -39,7 +50,6 @@ steps:

- label: CPU-Quantization Model Tests
depends_on: []
soft_fail: true
device: intel_cpu
no_plugin: true
source_file_dependencies:
Expand All @@ -59,7 +69,6 @@ steps:

- label: CPU-Distributed Tests
depends_on: []
soft_fail: true
device: intel_cpu
no_plugin: true
source_file_dependencies:
Expand All @@ -78,7 +87,6 @@ steps:

- label: CPU-Multi-Modal Model Tests %N
depends_on: []
soft_fail: true
device: intel_cpu
no_plugin: true
source_file_dependencies:
Expand All @@ -93,7 +101,6 @@ steps:

- label: "Arm CPU Test"
depends_on: []
soft_fail: true
device: arm_cpu
no_plugin: true
commands:
Expand Down
16 changes: 5 additions & 11 deletions .buildkite/image_build/image_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ clean_docker_tag() {
}

print_usage_and_exit() {
echo "Usage: $0 <registry> <repo> <commit> <branch> <vllm_use_precompiled> <vllm_merge_base_commit> <cache_from> <cache_to>"
echo "Usage: $0 <registry> <repo> <commit> <branch> <image_tag> [<image_tag_latest>]"
exit 1
}

Expand Down Expand Up @@ -151,15 +151,15 @@ print_bake_config() {
docker buildx bake -f "${VLLM_BAKE_FILE_PATH}" -f "${CI_HCL_PATH}" --print "${TARGET}" | tee "${BAKE_CONFIG_FILE}" || true
echo "Saved bake config to ${BAKE_CONFIG_FILE}"
echo "--- :arrow_down: Uploading bake config to Buildkite"
buildkite-agent artifact upload "${BAKE_CONFIG_FILE}"
(cd "$(dirname "${BAKE_CONFIG_FILE}")" && buildkite-agent artifact upload "$(basename "${BAKE_CONFIG_FILE}")")
}

#################################
# Main Script #
#################################
print_instance_info

if [[ $# -lt 7 ]]; then
if [[ $# -lt 5 ]]; then
print_usage_and_exit
fi

Expand All @@ -168,10 +168,8 @@ REGISTRY=$1
REPO=$2
BUILDKITE_COMMIT=$3
BRANCH=$4
VLLM_USE_PRECOMPILED=0
VLLM_MERGE_BASE_COMMIT=""
IMAGE_TAG=$7
IMAGE_TAG_LATEST=${8:-} # only used for main branch, optional
IMAGE_TAG=$5
IMAGE_TAG_LATEST=${6:-} # only used for main branch, optional

# build config
TARGET="test-ci"
Expand All @@ -198,17 +196,13 @@ export CACHE_FROM
export CACHE_FROM_BASE_BRANCH
export CACHE_FROM_MAIN
export CACHE_TO
export VLLM_USE_PRECOMPILED
export VLLM_MERGE_BASE_COMMIT

# print args
echo "--- :mag: Arguments"
echo "REGISTRY: ${REGISTRY}"
echo "REPO: ${REPO}"
echo "BUILDKITE_COMMIT: ${BUILDKITE_COMMIT}"
echo "BRANCH: ${BRANCH}"
echo "VLLM_USE_PRECOMPILED: ${VLLM_USE_PRECOMPILED}"
echo "VLLM_MERGE_BASE_COMMIT: ${VLLM_MERGE_BASE_COMMIT}"
echo "IMAGE_TAG: ${IMAGE_TAG}"
echo "IMAGE_TAG_LATEST: ${IMAGE_TAG_LATEST}"

Expand Down
3 changes: 1 addition & 2 deletions .buildkite/image_build/image_build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ steps:
depends_on: []
timeout_in_minutes: 600
commands:
- if [[ "$BUILDKITE_BRANCH" != "main" ]]; then .buildkite/image_build/image_build.sh $REGISTRY $REPO $BUILDKITE_COMMIT $BRANCH $VLLM_USE_PRECOMPILED $VLLM_MERGE_BASE_COMMIT $IMAGE_TAG; fi
- if [[ "$BUILDKITE_BRANCH" == "main" ]]; then .buildkite/image_build/image_build.sh $REGISTRY $REPO $BUILDKITE_COMMIT $BRANCH $VLLM_USE_PRECOMPILED $VLLM_MERGE_BASE_COMMIT $IMAGE_TAG $IMAGE_TAG_LATEST; fi
- if [[ "$BUILDKITE_BRANCH" == "main" ]]; then .buildkite/image_build/image_build.sh $REGISTRY $REPO $BUILDKITE_COMMIT $BRANCH $IMAGE_TAG $IMAGE_TAG_LATEST; else .buildkite/image_build/image_build.sh $REGISTRY $REPO $BUILDKITE_COMMIT $BRANCH $IMAGE_TAG; fi
retry:
automatic:
- exit_status: -1 # Agent was lost
Expand Down
14 changes: 6 additions & 8 deletions .buildkite/image_build/image_build_cpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu) ]]; then
if [[ -z $(docker manifest inspect "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-cpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
Expand All @@ -24,13 +24,11 @@ fi
# build
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--build-arg VLLM_CPU_AVX512BF16=true \
--build-arg VLLM_CPU_AVX512VNNI=true \
--build-arg VLLM_CPU_AMXBF16=true \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu \
--build-arg buildkite_commit="$BUILDKITE_COMMIT" \
--build-arg VLLM_CPU_X86=true \
--tag "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-cpu \
--target vllm-test \
--progress plain .

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu
docker push "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-cpu
10 changes: 5 additions & 5 deletions .buildkite/image_build/image_build_cpu_arm64.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu) ]]; then
if [[ -z $(docker manifest inspect "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-arm64-cpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
Expand All @@ -24,10 +24,10 @@ fi
# build
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu \
--build-arg buildkite_commit="$BUILDKITE_COMMIT" \
--tag "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-arm64-cpu \
--target vllm-test \
--progress plain .

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-cpu
docker push "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-arm64-cpu
10 changes: 5 additions & 5 deletions .buildkite/image_build/image_build_hpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ REPO=$2
BUILDKITE_COMMIT=$3

# authenticate with AWS ECR
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin $REGISTRY
aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin "$REGISTRY"

# skip build if image already exists
if [[ -z $(docker manifest inspect $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu) ]]; then
if [[ -z $(docker manifest inspect "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-hpu) ]]; then
echo "Image not found, proceeding with build..."
else
echo "Image found"
Expand All @@ -25,10 +25,10 @@ fi
docker build \
--file tests/pytorch_ci_hud_benchmark/Dockerfile.hpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit=$BUILDKITE_COMMIT \
--tag $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu \
--build-arg buildkite_commit="$BUILDKITE_COMMIT" \
--tag "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-hpu \
--progress plain \
https://github.com/vllm-project/vllm-gaudi.git

# push
docker push $REGISTRY/$REPO:$BUILDKITE_COMMIT-hpu
docker push "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-hpu

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Qwen3-235B-A22B-Instruct-2507-FP8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,4 @@ lm_eval --model vllm-vlm \
--tasks chartqa \
--batch_size auto \
--apply_chat_template \
--limit $LIMIT
--limit "$LIMIT"
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,11 @@ usage() {
echo
}

while getopts "m:b:l:f:t:" OPT; do
while getopts "m:l:f:t:" OPT; do
case ${OPT} in
m )
MODEL="$OPTARG"
;;
b )
BATCH_SIZE="$OPTARG"
;;
l )
LIMIT="$OPTARG"
;;
Expand Down
10 changes: 8 additions & 2 deletions .buildkite/lm-eval-harness/test_lm_eval_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@
from contextlib import contextmanager

import lm_eval
import numpy as np
import yaml

from vllm.platforms import current_platform

DEFAULT_RTOL = 0.08


Expand Down Expand Up @@ -63,6 +64,9 @@ def launch_lm_eval(eval_config, tp_size):
"allow_deprecated_quantization=True,"
)

if current_platform.is_rocm() and "Nemotron-3" in eval_config["model_name"]:
model_args += "attention_backend=TRITON_ATTN"

env_vars = eval_config.get("env_vars", None)
with scoped_env_vars(env_vars):
results = lm_eval.simple_evaluate(
Expand Down Expand Up @@ -102,6 +106,8 @@ def test_lm_eval_correctness_param(config_filename, tp_size):
f"ground_truth={ground_truth:.3f} | "
f"measured={measured_value:.3f} | rtol={rtol}"
)
success = success and np.isclose(ground_truth, measured_value, rtol=rtol)

min_acceptable = ground_truth * (1 - rtol)
success = success and measured_value >= min_acceptable

assert success
1 change: 0 additions & 1 deletion .buildkite/performance-benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ We test the throughput by using `vllm bench serve` with request rate = inf to co
"server_parameters": {
"model": "meta-llama/Meta-Llama-3-8B",
"tensor_parallel_size": 1,
"swap_space": 16,
"disable_log_stats": "",
"load_format": "dummy"
},
Expand Down
Loading
Loading