Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
437 commits
Select commit Hold shift + click to select a range
e42b49b
Mistral common v10 (#36971)
juliendenize Mar 14, 2026
a8e8d62
[Misc] Clean up Kimi-audio whisper encoder loading (#36903)
Isotr0py Mar 14, 2026
84868e4
[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM for…
seanmamasde Mar 14, 2026
3ed46f3
[Model Runner V2] Add Support for XD-RoPE (#36817)
santiramos27 Mar 14, 2026
5467d13
[Frontend] Avoid startup error log for models without chat template (…
DarkLight1337 Mar 14, 2026
8c29042
[Feature] Add InstantTensor weight loader (#36139)
arlo-scitix Mar 14, 2026
821fde2
[Bugfix] Fix xgrammar dtype mismatch on macOS CPU inference (#32384)
karanb192 Mar 14, 2026
458c1a4
[Frontend] Reduce chat template warmup logging levels (#37062)
njhill Mar 14, 2026
b3debb7
[Build] Upgrade xgrammar to get a security fix (#36168)
russellb Mar 15, 2026
6590a3e
[Frontend] Remove `torchcodec` from audio dependency (#37061)
Isotr0py Mar 15, 2026
143e4dc
[Misc] Add online audio_in_video test (#36775)
Isotr0py Mar 15, 2026
a3e2e25
[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#3…
hasethuraman Mar 15, 2026
697e4ff
[GDN] add a config for gdn kernel selection (#36647)
ZJY0516 Mar 15, 2026
7acaea6
In-Tree AMD Zen CPU Backend via zentorch [1/N] (#35970)
amd-lalithnc Mar 15, 2026
e9163b5
[responsesAPI][ez] add a unit test for SimpleContext logprobs (#37126)
qandrew Mar 16, 2026
0024f39
[ROCm][P/D][MORI][BugFix] Add transfer_id for moriio_connector so mor…
rasmith Mar 16, 2026
68e1b71
[XPU] Add deepseek_scaling_rope fused kernel (#36612)
yitingw1 Mar 16, 2026
d4c5786
[ROCm][CI] Fix engine teardown and text normalization to stabilize vo…
AndreasKaratzas Mar 16, 2026
57a314d
[CI][Bugfix] Fix 500 errors from priority overflow and TemplateError …
AndreasKaratzas Mar 16, 2026
7362b44
[Bugfix] Avoid LD_PRELOAD check on MacOS (#37145)
bigPYJ1151 Mar 16, 2026
2390d44
[Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
bigshanedogg Mar 16, 2026
2754231
[Kernel] Add FlashInfer MoE A2A Kernel (#36022)
leo-cf-tian Mar 16, 2026
96efb91
[Model Runner V2] Fix processed logits in sample() (#37144)
WoosukKwon Mar 16, 2026
8d3f8f4
[Bugfix] fix Qwen3.5 tool calling bug (#36774)
chaunceyjiang Mar 16, 2026
911355e
[ROCm] Fix KV copy methods and auto-select attention backend for ROCm…
AndreasKaratzas Mar 16, 2026
a2956a0
[ROCm][CI] Retrying in case of batch variance effects and reducing fl…
AndreasKaratzas Mar 16, 2026
821eb80
[Performance][Model Loader] Skip non-local expert weights during EP m…
esmeetu Mar 16, 2026
52131f8
use skip_all_guards_unsafe to drop global_state and torch_function_mo…
laithsakka Mar 16, 2026
912fbe9
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-vi…
Isotr0py Mar 16, 2026
8374387
[FlashInfer] Revert block_size 16 + head_size 256 workaround on Black…
vadiklyutiy Mar 16, 2026
116ed13
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batche…
haosdent Mar 16, 2026
0115e95
[Frontend][Misc] Remove unused log in `/is_sleeping` (#37093)
esmeetu Mar 16, 2026
d8f8a7a
[Misc] Sync pre-commit to 4.5.1 in workflows and docs (#36675)
SoluMilken Mar 16, 2026
122f75d
Fix pipeline parallel with multimodal models with the Transformers mo…
hmellor Mar 16, 2026
747b068
[Hardware] Replace memory related torch.cuda APIs (#37031)
jikunshang Mar 16, 2026
ad041c7
Fix text only inputs for MRoPE models with the Transformers modelling…
hmellor Mar 16, 2026
bf9a185
GLM4 tool parser: fix streaming mode (#35208)
RNabel Mar 16, 2026
9b005ed
[Docs] Make the link to hardware plugins clearer (#37174)
hmellor Mar 16, 2026
f5e59ee
[Performance] Add prefetch for checkpoints to OS page cache (#36012)
arpera Mar 16, 2026
d61d2b0
[Build] Fix API rate limit exceeded when using `VLLM_USE_PRECOMPILED=…
elvischenv Mar 16, 2026
f9e6db3
[Models][Qwen3 ViT] Keep `max_seqlen` on CPU to prevent D2H sync (#37…
lgeiger Mar 16, 2026
ffbc2e5
Patch Mistral config (#37104)
juliendenize Mar 16, 2026
43a73f8
Remove unused EVS functions in qwen3_vl.py (#37183)
gty111 Mar 16, 2026
04bf5a3
[Spec Decode] Update extract_hidden_states to use deferred kv_connect…
fynnsu Mar 16, 2026
0e5a938
[Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
bbartels Mar 16, 2026
e855d38
[Compile] Fix compile warning in `moe_permute` (#36529)
yewentao256 Mar 16, 2026
8d8855f
[Bugfix] Add safety check and fallback for null scaling factor (#36106)
yuanheng-zhao Mar 16, 2026
18be11f
[BUGFIX]fix CUDA OOM ERROR : invalid argument at cumem_allocator.cpp:…
flutist Mar 16, 2026
ce8cf91
[Compile] Fix compile warning `st256_cs` in `cuda_vec_utils.cuh` (#36…
yewentao256 Mar 16, 2026
5ae685c
[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer l…
Etelis Mar 16, 2026
6682c23
[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (#37148)
chaunceyjiang Mar 16, 2026
55e6d3d
[Bugfix] Make siglip/clip compatible with transformers v5 (#37200)
zucchini-nlp Mar 16, 2026
ca1954d
[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37…
haosdent Mar 16, 2026
9f9ecff
Add simple granite4 tool parser (#36827)
maxdebayser Mar 16, 2026
c88ea83
[MTP][Sparse MLA] Take advantage of native MTP support in indexer whe…
MatthewBonanni Mar 16, 2026
f5c081d
[PD][Nixl] Add support for hybrid SSM-FA models (#36687)
NickLucche Mar 16, 2026
0fefd00
[Bugfix] Fix render server crash for quantized models on CPU-only hos…
sagearc Mar 16, 2026
714c6e0
[torch.compile][BE] Modify cudagraph callable to check for is_forward…
Lucaskabela Mar 16, 2026
dfa8852
[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
sfeng33 Mar 16, 2026
2cc26c3
[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for tes…
rasmith Mar 16, 2026
93f3c8e
[Misc] Add `float16` to `CacheDType` (#37199)
MatthewBonanni Mar 16, 2026
d157216
[BUGFIX][Mamba] Use uint64 for address in KVBlockZeroer (#37197)
jikunshang Mar 16, 2026
2dccb38
[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-conne…
ZhanqiuHu Mar 16, 2026
e6ae4b1
[compile] Enable mega aot artifact for torch 2.12+. (#37198)
zhxchen17 Mar 16, 2026
c0f0119
[Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#3…
KrxGu Mar 16, 2026
fd4d963
Fix eplb nvfp4 experts hook (#37217)
elvircrn Mar 16, 2026
e5b8076
[Quant][Feature] Support online MXFP8 quantization for MoE and dense …
EdalatiAli Mar 16, 2026
a3a51d2
[Benchmark] Improvements to attention benchmark script (#37115)
wzhao18 Mar 16, 2026
31a458c
[Doc] Clarify schema enforcement behavior for tool_choice modes (#37064)
cemigo114 Mar 16, 2026
4f9b14c
[CI] Stabilize multinode DP internal LB completion tests (#36356)
AndreasKaratzas Mar 16, 2026
7961486
Fix EagleMistralLarge3Model initialization (#37232)
juliendenize Mar 16, 2026
3e6a1e1
[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
tianrengao Mar 16, 2026
7a49742
[CI/Build] Add common tool call parser test suite (#27599)
bbrowning Mar 16, 2026
061980c
[Feature][Frontend] add support for Cohere Embed v2 API (#37074)
walterbm Mar 16, 2026
5db91f0
Fix some Mistral parser issues (#37209)
juliendenize Mar 17, 2026
45f526d
[BugFix] Correct max memory usage for multiple KV-cache groups (#36030)
peakcrosser7 Mar 17, 2026
6c1cfba
Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867)
vadiklyutiy Mar 17, 2026
0a0a1a1
Add ability to replace oot ops when using lora (#37181)
kyuyeunk Mar 17, 2026
f04d522
[CI] Fix flaky tool_use chat completion tests with deterministic seed…
sfeng33 Mar 17, 2026
384dc7f
[Refactor] Relocate completion and chat completion tests (#37125)
sfeng33 Mar 17, 2026
54a62a7
[ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe o…
AndreasKaratzas Mar 17, 2026
3e3d320
[Refactor] Relocate responses API tests (#37241)
sfeng33 Mar 17, 2026
17c1bdf
[Bugfix] dtype mismatch in ngram gpu propose (#37246)
PatchouliTIS Mar 17, 2026
20b1409
[Bugfix] Fix loading Music Flamingo (#35535)
NickCao Mar 17, 2026
8a68046
[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447)
benchislett Mar 17, 2026
24b4272
Fix infinite recursive search issue in quark.py (#32779)
xiao-llm Mar 17, 2026
132bfd4
[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds ma…
chaunceyjiang Mar 17, 2026
9c7cab5
[Feature]: Support for multiple embedding types in a single inference…
staugust Mar 17, 2026
4af9ed2
[Bugfix](xpu): prevent “selected index k out of range” in TP decode p…
zhejiangxiaomai Mar 17, 2026
00f8e0d
[Frontend] Delegate tokenization serving preprocessing to OpenAIServi…
sagearc Mar 17, 2026
0fb142a
[perf][connector] optimize build_connector_meta when host buffer tran…
youkaichao Mar 17, 2026
293f036
Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664)
ajpqs Mar 17, 2026
2660b92
Bugfix for offloading+prefetch for GLM-4.7-FP8 (#37178)
sfbemerk Mar 17, 2026
f340324
[1/2] Move InternVL-based processors (#37260)
DarkLight1337 Mar 17, 2026
56cb1ba
[Misc] Use VLLMValidationError in batch, pooling, and tokenize protoc…
umut-polat Mar 17, 2026
59192df
[Frontend] Complete OpenAI render delegation (#37287)
sagearc Mar 17, 2026
77d2a5f
pick up tuned prefill configs for FP8 FA3 (#36265)
jmkuebler Mar 17, 2026
c25dbc2
[Bugfix] Fix unclean shutdown crash with AllReduce Fusion workspace (…
siewcapital Mar 17, 2026
ecfcdd2
Fix Phi3 test that fails with Transformers v5 (#37298)
hmellor Mar 17, 2026
3717a4d
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific m…
bhoomit Mar 17, 2026
a836524
[Chore] Replace all base64 usages with faster pybase64 package (#37290)
Isotr0py Mar 17, 2026
2ff0ad9
[`UltraVox`] Fix output type (#37224)
vasqu Mar 17, 2026
c9e5096
[openapi] remove redundant exception stack trace[4/N] (#37157)
andyxning Mar 17, 2026
f63ed7b
[Bugfix] Fix DP MTP Dummy Run (#35243)
benchislett Mar 17, 2026
979ff44
[BugFix] PyTorch Compilation Tests should error if any test fails (#3…
zou3519 Mar 17, 2026
c781fbb
[Bugfix] Standardize custom HF Processor init (#37289)
DarkLight1337 Mar 17, 2026
4ed5130
[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __…
AndreasKaratzas Mar 17, 2026
51b2333
[Perf] Optimize top-k search in apply_top_k_top_p_triton sampler (#37…
mgoin Mar 17, 2026
c5030c4
[CI] Split Distributed Tests (4 GPUs) and Kernel MoE tests (#37100)
avinashsingh77 Mar 17, 2026
68f783a
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compati…
atalman Mar 17, 2026
bdb903b
[Bug] Fix FlashInfer MNNVL socket collisions under concurrent vLLM jo…
yewentao256 Mar 17, 2026
fa75204
bump compressed-tensors version to 0.14.0.1 (#36988)
brian-dellabetta Mar 17, 2026
51f0acd
[Model] Remove unused `handle_oov_mm_token` (#37321)
DarkLight1337 Mar 17, 2026
e78821b
[Deprecation] Deprecate `--calculate-kv-scales` option (#37201)
mgoin Mar 17, 2026
b36adfa
[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache …
wzhao18 Mar 17, 2026
1204cf0
[Bugfix] Fix mock.patch resolution failure for standalone_compile.Fak…
dbari Mar 17, 2026
2457589
[Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#…
ricky-chaoju Mar 17, 2026
b5ca9c3
[Models] Cohere ASR (#35809)
ekagra-ranjan Mar 17, 2026
c0745a8
[Model] Add ColQwen3.5 4.5B support (#36887)
athrael-soju Mar 17, 2026
de35c06
Make KV connector metadata build overridable via plugin (#37336)
sarckk Mar 17, 2026
e8f9dbc
[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable …
JartX Mar 17, 2026
3ed7b1e
[ROCm] Validate block_size for explicitly selected attention backends…
AndreasKaratzas Mar 17, 2026
09e4576
[Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320)
mgoin Mar 17, 2026
e6c4797
[ROCm][Quantization] add fp8xfp8 attn support for rocm_aiter_unified_…
divakar-amd Mar 18, 2026
ff9fbc9
[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynam…
gmagogsfm Mar 18, 2026
761e0aa
[Performance] Add --enable-ep-weight-filter CLI option (#37351)
esmeetu Mar 18, 2026
58cde5c
[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
AndreasKaratzas Mar 18, 2026
f174000
[Perf] Enable dual stream execution of input projection for Qwen3 (#3…
xyang16 Mar 18, 2026
a0dd199
[Hardware][TPU] Add supports_async_scheduling() method to Executor in…
gxd3 Mar 18, 2026
8b63257
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture …
AndreasKaratzas Mar 18, 2026
ce2ef42
[CI] Stabilize test_cpu_offloading by waiting for async offload befor…
AndreasKaratzas Mar 18, 2026
0e95916
[responsesAPI] parser.extract_response_outputs can take in token IDs …
qandrew Mar 18, 2026
86b7e3c
[XPU] skip unsupported ut and update test_nixl_connector (#37179)
zhenwei-intel Mar 18, 2026
fcf0687
[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
orozery Mar 18, 2026
2618012
[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391)
bigPYJ1151 Mar 18, 2026
8c31f47
[LoRA] Make LoRA respect `language_model_only` (#37375)
jeejeelee Mar 18, 2026
fad09e8
fix(glm47): improve tool call parsing and content normalization (#37386)
karanb192 Mar 18, 2026
47a1f11
[docs] Add docs for new RL flows (#36188)
hao-aaron Mar 18, 2026
eaf7c9b
[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API r…
AndreasKaratzas Mar 18, 2026
b322b19
[Build] Bump python openai version (#32316)
chaunceyjiang Mar 18, 2026
17c47fb
[Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy (#37322)
elvircrn Mar 18, 2026
cef1f30
[Model] Enable LoRA support for tower and connector in H2OVL (#31696)
shwetha-s-poojary Mar 18, 2026
98b09dd
[NIXL][Bugfix] metrics & testing minor bug (#36051)
andylolu2 Mar 18, 2026
918b789
[Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
he-yufeng Mar 18, 2026
525f2ee
[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
orozery Mar 18, 2026
99267c2
[2/3] Refactor InternVL-based processors (#37324)
DarkLight1337 Mar 18, 2026
de1a86b
elastic_ep: Fix stateless group port races (#36330)
itayalroy Mar 18, 2026
c373b5c
[Log] Reduce duplicate log (#37313)
yewentao256 Mar 18, 2026
296839a
[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer M…
elvischenv Mar 18, 2026
1780839
standardize load_weights using AutoWeightsLoader for kimi_linear and …
XLiu-2000 Mar 18, 2026
b1169d7
[Kernel] Add gpt-oss Router GEMM kernel (#37205)
xyang16 Mar 18, 2026
c9d838f
Adding deterministic lora benchmarking to vLLM Bench (#36057)
RonaldBXu Mar 18, 2026
39bfb57
Add API docs link if the CLI arg is a config class (#37432)
hmellor Mar 18, 2026
5dd8df0
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec…
orozery Mar 18, 2026
0ef7f79
[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E through…
yewentao256 Mar 18, 2026
f3732bd
[Misc] Clean up model registry (#37457)
DarkLight1337 Mar 18, 2026
7476d14
[Model] Remove unnecessary processor definition for Nemotron Parse (#…
DarkLight1337 Mar 18, 2026
70b81c4
[bugfix][async scheduling] fix extra cuda context in device 0 with EP…
youkaichao Mar 18, 2026
738d0a2
[Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp …
cnyvfang Mar 18, 2026
5ce2d10
Fix models which use `layer_type_validation` for Transformers v5 (#37…
hmellor Mar 18, 2026
a913b61
[Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795) (#…
JartX Mar 18, 2026
6ae4c8d
chunk parakeet into 30s clips to prevent OOMs on long audios (#36671)
netanel-haber Mar 18, 2026
0d81a1f
[V0 Deprecation] Deprecate virtual engine (#37195)
yewentao256 Mar 18, 2026
0091017
fix(worker): optimize swap_states to copy only active token prefixes …
pjo256 Mar 18, 2026
5bc1da1
[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928)
WoosukKwon Mar 18, 2026
9482b0b
[Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465)
mgoin Mar 18, 2026
04244fd
[Model Runner V2] Spec decode rejection sampler greedy support (#37238)
TheEpicDolphin Mar 18, 2026
577df69
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache…
andylolu2 Mar 18, 2026
828f862
[Bugfix] Expand quantization method support in perf metrics (#37231)
thillai-c Mar 18, 2026
9dade5d
[XPU]Unify xpu test dependencies in dockerfile.xpu (#36477)
1643661061leo Mar 19, 2026
ef2c4f7
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from C…
elvircrn Mar 19, 2026
c32a58c
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
SageMoore Mar 19, 2026
5f82706
[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync ex…
hao-aaron Mar 19, 2026
053f3b6
[Model Runner V2] Spec decode rejection sampler logprobs support (#37…
TheEpicDolphin Mar 19, 2026
6accb21
[bug] Fix deadlock with pause resume and collective_rpc (#37024)
hao-aaron Mar 19, 2026
e37ff5b
[Perf] Optimize token_embed for pooling models, 1.0% token throughput…
yewentao256 Mar 19, 2026
e3126cd
[ROCm] issue management - request information for bug issues on ROCm …
hongxiayang Mar 19, 2026
b21d384
[Refactor] Relocate endpoint tests to mirror serving code directory s…
sfeng33 Mar 19, 2026
d49f273
[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310)
ZhanqiuHu Mar 19, 2026
354cd58
fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic str…
cdpath Mar 19, 2026
d3cc379
[Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425)
ZeldaHuang Mar 19, 2026
0b6d526
Support temporal compression for Nemotron-3-VL videos (#36808)
collinmccarthy Mar 19, 2026
da70c87
[CI] Fix wrong path test file, missing `rlhf_async_new_apis.py` (#37532)
tjtanaa Mar 19, 2026
ca21483
[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_av…
jikunshang Mar 19, 2026
199f914
fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369)
yassha Mar 19, 2026
6a9cceb
[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_…
Duyi-Wang Mar 19, 2026
765e461
[Bugfix] Fix Nemotron Parse loading (#37407)
DarkLight1337 Mar 19, 2026
3322e26
[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile …
bigPYJ1151 Mar 19, 2026
4426447
Don't log `exc_info` when vLLM tries to doenload a file that doesn't …
hmellor Mar 19, 2026
f9e2a38
[Docs] Reorganize pooling docs. (#35592)
noooop Mar 19, 2026
c7bc12c
[CI/Build] Split out MM pooling tests (#37542)
DarkLight1337 Mar 19, 2026
7a6ebcb
[Model] Remove unnecessary `get_language_model` (#37545)
DarkLight1337 Mar 19, 2026
e390742
Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu…
xueliangyang-oeuler Mar 19, 2026
a32eaf5
[CI] Merge `cleanup_pr_body.yml` and `reminder_comment.yml` (#37552)
hmellor Mar 19, 2026
c63ca2b
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id…
DorBernsohn Mar 19, 2026
9515c20
[Misc] Clean up processing logic (#37541)
DarkLight1337 Mar 19, 2026
572b432
Stop bench CLI from recursively casting all configs to `dict` (#37559)
hmellor Mar 19, 2026
7c0cf3b
Cap the number of API servers to 1 when using Elastic EP. (#37466)
SageMoore Mar 19, 2026
96266f1
[LoRA] Minor improvements to LoRA log (#37557)
jeejeelee Mar 19, 2026
104605c
Remove deprecated reasoning_content message field(part-2) (#37480)
ikaadil Mar 19, 2026
8b10e4f
[1/n] Migrate permute_cols to libtorch stable ABI (#31509)
mikaylagawarecki Mar 19, 2026
40b8363
[MRV2] Use fp32 for draft logits (#37526)
WoosukKwon Mar 19, 2026
e27b8ba
[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#3…
wzhao18 Mar 19, 2026
657855a
[Misc] Cleanup more configs and processors (#37560)
DarkLight1337 Mar 19, 2026
4dce832
Run MacOS smoke test on daily `cron` job instead of every commit (#37…
hmellor Mar 19, 2026
34f093b
[CI] Gate pre-commit on `ready` label or number of contributions (#37…
hmellor Mar 19, 2026
2890aec
[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37…
fadara01 Mar 19, 2026
2f9f946
[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation …
chaunceyjiang Mar 19, 2026
7769b58
[torch.compile][BE][Multimodal] Remove requirement to set_model_tag t…
Lucaskabela Mar 19, 2026
daa05bf
[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM i…
EdalatiAli Mar 19, 2026
e5d96dc
Fix `SpeculatorsConfig` now that `PreTrainedConfig` is a `dataclass` …
hmellor Mar 19, 2026
fb8b5e0
[CI] Add retry with 4x backoff to HTTP fetches for transient failures…
AndreasKaratzas Mar 19, 2026
7454096
[Log] Log once in local node by default (#37568)
yewentao256 Mar 19, 2026
9279c59
[MoE Refactor] DefaultMoERunner simplifcation (#33049)
bnellnm Mar 19, 2026
040a505
[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
AndreasKaratzas Mar 19, 2026
4ee847e
Comment fix for async rl example (#35244)
hao-aaron Mar 19, 2026
91be5f9
[MoE Refactor] Rename "naive" all2all backend (#36294)
bnellnm Mar 19, 2026
112944f
test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064)
laithsakka Mar 19, 2026
b55156e
[Performance] Enable Triton autotuning disk cache by default (#37188)
arpera Mar 19, 2026
98ff042
[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_…
rasmith Mar 19, 2026
4120a05
Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448)
jhsmith409 Mar 19, 2026
2be1a0f
[Refactor] Remove dead code in pooling model (#37572)
yewentao256 Mar 19, 2026
df3c029
[Bug] Fix EmbedIOprocessor "classify" <-> "embed" (#37573)
yewentao256 Mar 19, 2026
be12afd
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056)
sfeng33 Mar 19, 2026
4ca3fa6
[ROCm][Bugfix] fix cache block size mismatch for aiter unified attent…
divakar-amd Mar 20, 2026
ca1ac1a
Fix DP coordinator ZMQ port TOCTOU (#37452)
itayalroy Mar 20, 2026
e5a77a5
[CI] Update mergify tool-calling label paths (#37478)
sfeng33 Mar 20, 2026
269bf46
fix: disambiguate multimodal prefix cache keys (#36708)
tianshu-Michael-yu Mar 20, 2026
47b7af0
[Feat] Enable CompressedTensorW4A8Int for XPU (#37207)
tianmu-li Mar 20, 2026
ea2c148
[compile][graph_partition]Add tensor size handling (#36038)
fxdawnn Mar 20, 2026
8fbe3f3
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
jeejeelee Mar 20, 2026
9040151
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
sfeng33 Mar 20, 2026
638a872
fix(xpu): Re-compute compile ranges after platform-specific config up…
Liangyx2 Mar 20, 2026
3947451
[Model Runner V2] fix draft attention metadata generation (#37364)
TheEpicDolphin Mar 20, 2026
6951fcd
[XPU] Automatically detect target platform as XPU in build. (#37634)
ccrhx4 Mar 20, 2026
e2d1c8b
[Refactor] Relocate entrypoint tests to match serving code structure …
sfeng33 Mar 20, 2026
30108fc
[Model] Refactor Step3-VL processor to HF style (#37579)
DarkLight1337 Mar 20, 2026
0674d1f
[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293)
Wangbei25 Mar 20, 2026
bdf6a0a
[XPU] bump vllm-xpu-kernels to v0.1.4 (#37641)
jikunshang Mar 20, 2026
0140eaf
[Bug] Fix FlashInfer allreduce fusion workspace uninitialized error (…
wzhao18 Mar 20, 2026
bd8c4c0
[CI] Removing deprecated rlhf examples reference (#37585)
AndreasKaratzas Mar 20, 2026
dcee9be
[Model Runner V2] Fix draft logits not populated during cudagraph rep…
TheEpicDolphin Mar 20, 2026
ed359c4
[Model] Deprecate the score task (this will not affect users). (#37537)
noooop Mar 20, 2026
9cfd4eb
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list …
AndreasKaratzas Mar 20, 2026
37cd9fc
[ROCm][CI] Remove deepep DBO tests on gfx90a (#37614)
AndreasKaratzas Mar 20, 2026
5a4a179
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible…
AndreasKaratzas Mar 20, 2026
6050b93
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/…
sfeng33 Mar 20, 2026
b4c1aef
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypo…
sfeng33 Mar 20, 2026
7fc1472
Merge upstream/main into matthias.awq_gemv
mgehre-amd Mar 20, 2026
cf4e514
Fix mypy errors from upstream merge conflict resolution
mgehre-amd Mar 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .buildkite/hardware_tests/amd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ steps:
docker build
--build-arg max_jobs=16
--build-arg REMOTE_VLLM=1
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx942;gfx950'
--build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942;gfx950'
--build-arg VLLM_BRANCH=$BUILDKITE_COMMIT
--tag "rocm/vllm-ci:${BUILDKITE_COMMIT}"
-f docker/Dockerfile.rocm
Expand Down
14 changes: 14 additions & 0 deletions .buildkite/hardware_tests/cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,20 @@ steps:
pytest -x -v -s tests/kernels/moe/test_cpu_fused_moe.py
pytest -x -v -s tests/kernels/test_onednn.py"

- label: CPU-Compatibility Tests
depends_on: []
soft_fail: true
device: intel_cpu
no_plugin: true
source_file_dependencies:
- cmake/cpu_extension.cmake
- setup.py
- vllm/platforms/cpu.py
commands:
- |
bash .buildkite/scripts/hardware_ci/run-cpu-test.sh 20m "
bash .buildkite/scripts/hardware_ci/run-cpu-compatibility-test.sh"

- label: CPU-Language Generation and Pooling Model Tests
depends_on: []
soft_fail: true
Expand Down
4 changes: 1 addition & 3 deletions .buildkite/image_build/image_build_cpu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,7 @@ fi
docker build --file docker/Dockerfile.cpu \
--build-arg max_jobs=16 \
--build-arg buildkite_commit="$BUILDKITE_COMMIT" \
--build-arg VLLM_CPU_AVX512BF16=true \
--build-arg VLLM_CPU_AVX512VNNI=true \
--build-arg VLLM_CPU_AMXBF16=true \
--build-arg VLLM_CPU_X86=true \
--tag "$REGISTRY"/"$REPO":"$BUILDKITE_COMMIT"-cpu \
--target vllm-test \
--progress plain .
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Qwen3-235B-A22B-Instruct-2507-FP8.yaml
Loading
Loading