Skip to content

[Model Runner V2] Use actual batch max_seq_len for attn metadata#43991

Merged
njhill merged 4 commits into
vllm-project:mainfrom
izhuhaoran:fix-mrv2-hybrid-attn-max-seq-len
Jun 2, 2026
Merged

[Model Runner V2] Use actual batch max_seq_len for attn metadata#43991
njhill merged 4 commits into
vllm-project:mainfrom
izhuhaoran:fix-mrv2-hybrid-attn-max-seq-len

Conversation

@izhuhaoran
Copy link
Copy Markdown
Contributor

Purpose

PR #40654 introduced using the actual batch max_seq_len (instead of max_model_len) for attention metadata in DefaultModelState. This is a follow-up that applies the same handling to the two V2 paths it missed: MambaHybridModelState.prepare_attn and the eagle/MTP draft _build_draft_attn_metadata, which still passed max_seq_len=max_model_len.

Handing max_model_len to FlashInfer makes the TRTLLM attention walk past the valid block-table entries; use the actual batch max for normal runs and the worst-case max_model_len only during CUDA graph capture.

@mergify mergify Bot added the v1 label May 29, 2026
@izhuhaoran
Copy link
Copy Markdown
Contributor Author

@WoosukKwon @njhill Could you please review this PR when you have time ?

Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @izhuhaoran, great catch!

Comment thread vllm/v1/worker/gpu/model_states/mamba_hybrid.py Outdated
Comment thread vllm/v1/worker/gpu/spec_decode/eagle/speculator.py Outdated
Comment thread vllm/v1/worker/gpu/spec_decode/eagle/speculator.py Outdated
@njhill njhill added the v2 label May 29, 2026
…tion metadata

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
@izhuhaoran izhuhaoran force-pushed the fix-mrv2-hybrid-attn-max-seq-len branch from 84e040b to 83cc46b Compare May 30, 2026 06:17
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 30, 2026
@njhill njhill enabled auto-merge (squash) May 30, 2026 18:20
@izhuhaoran
Copy link
Copy Markdown
Contributor Author

@njhill Thanks for the review! By the way, I think the CI error ImportError: libcudart.so.12: cannot open shared object file: No such file or directory is unrelated to this PR. Could you please help manually merge this PR, or re-run the failed CI job?

@njhill njhill merged commit 1edfd09 into vllm-project:main Jun 2, 2026
74 checks passed
hynky1999 added a commit to macrodata-labs/vllm that referenced this pull request Jun 2, 2026
* [MM] Enable FlashInfer metadata support for Qwen2.5-VL vision attention (#42787)

Signed-off-by: Hua Huang <huah@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [Docs] Fix stale version number in token_embed.md (#43488)

Signed-off-by: holegots <ikun3.1415927@gmail.com>

* [Docs] Fix stale version number in token_classify.md (#43489)

Signed-off-by: holegots <ikun3.1415927@gmail.com>

* [MoE] Migrate W4A8 CT to oracle kernel setup (#42680)

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>

* [Mooncake] Add metrics for MooncakeStoreConnector operations (#43392)

* [ROCm][Critical] Fix the GDN import bug (#43486)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* Revert "[Misc] add humming to dependencies" (#43492)

* [Bugfix] Fix reasoning dropped on streaming boundary deltas (#42691)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [Model Runner v2] Force v1 runner for tests (#43233)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [KV Connector] Keep MooncakeStore full hits block-aligned (#43494)

Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [kv_offload]: Add DSv4 support (#43142)

Signed-off-by: Or Ozeri <oro@il.ibm.com>

* [ROCm][CI] Stabilize 400 error return code for invalid schema inputs (#43016)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [ROCm] [DSv4] [Perf] Support DeepSeek v4 MTP (#43385)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* Tuning script and configs for Triton Mamba SSU kernel (#43083)

Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>

* File system secondary tier implemented in python (#41735)

Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>

* [Kernel] Add mhc_pre_big_fuse_with_norm_tilelang  (#43474)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* fix: MoE model using shared routed experts crashes on AMD GPUs (#42373)

Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

* [Docs] Reorganize offline inference docs.  (#43552)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)

Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [Feat][KVConnector] Support DSV4 in SimpleCPUOffloadBackend (#42296)

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

* [Doc] Add section on escalating stalled contributions (#43568)

Signed-off-by: esmeetu <jasonailu87@gmail.com>

* Reduce memory usage for granite_speech. (#42933)

Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [KV Connector] Handle Mooncake finish after preemption (#43281)

Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>

* [Misc] Print accuracy value for PD tests even on success  (#43583)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Kernel] Remove NormGateLinear (#43554)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [XPU] Ensure RNG offset alignment with PyTorch requirements in XPU sampler (#43028)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [LoRA] Add one shot triton kernel For MoE LoRA (#42290)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [DeepSeek V4] Move MegaMoE input prep kernel to nvidia/ops (#43632)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [KV Connector][Bugfix] MooncakeStore: don't double-apply Eagle prune in load_mask (#43516)

Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [KV Connector] Propagate MooncakeStore load failures (#42788)

Signed-off-by: Dao Le <Dao007forever@gmail.com>

* [Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler (#43194)

Signed-off-by: Yan Ma <yan.ma@intel.com>

* [Frontend] Split the offline inference APIs and utils. (#43553)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [Bugfix][Model] Fix GPT2ForSequenceClassification sub-module prefix (#43579)

Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [GDN] GDN Prefill kernel for SM100 (#43273)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [CPU] Enable non-divisible GQA for decode workitems in mixed batches (#43032)

Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>

* Upgrade tpu-inference to v0.20.0 (#43394)

* Add CuTe DSL sparse compressor support (#43584)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>

* [chores][log] change registry log from `warning` to `debug` (#43045)

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [Bugfix] Apply fc_norm in Eagle3DeepseekV2 combine_hidden_states (#43482)

Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [KV Transfer] Enable HMA by default for connectors that support it (#41847)

Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>

* [Misc][Refactor][ROCm] Convert MoRI-related envvars to extra config args (#43303)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

* [Misc] Support interleaved custom image benchmark datasets (#43636)

Signed-off-by: ThibaultCastells <thib.castells@icloud.com>

* [Reasoning] [Bugfix] Reject invalid thinking_token_budget values (#43402)

Signed-off-by: linzm1007 <linzm1007@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Model] Use AutoWeightsLoader for InternLM2 (#38278)

Signed-off-by: Jesus De Jesus <dejesus.9297@gmail.com>
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [XPU] Fix fused MoE LoRA kernel crash on XPU by using platform-agnos num_compute_units (#43646)

Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>

* Fix CuPy runtime deps and restore humming (#43530)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Docs][ROCm] MoRI-IO Connector Usage Guide (#43603)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ROCm][CI] Extend ROCm quick reduce coverage (#40990)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162)

* [MoE Refactor] Migrate ModelOptMxFp8FusedMoE to oracle (#42768)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [MoE Refactor] W4a8 int8 oracle (#42789)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [ROCm] Remove MegaMoE integration in deepseek v4 (#43629)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Add LM head quantization support for ModelOpt (#42124)

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

* [Doc] Add line limit to AGENTS.md (#43635)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

* [DSv4] Drop _get_compressed_kv_buffer in DeepseekCompressor (#43690)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [CI] Soft-fail AMD entrypoints mirror tests (#43709)

Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Kernel] Porting  fuse_minimax_qk_norm  to manual fusion (#43410)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [KV Connector] MooncakeStore: drop dead discard_partial_chunks parameter (#43627)

Signed-off-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Bugfix][V1] Fix TOCTOU race causing intermittent `EADDRINUSE` on multi-API-server DP startup (#42585)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ci] Add arm64 ci image (#41303)

Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Bugfix] Split attention groups by num_heads_q for spec-decode drafts (#43543)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [Rust Frontend] Add reasoning/tool parser & renderer roundtrip tests (#43582)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [ROCm][CI] Fix ROCm multimodal Qwen2.5-VL activation compile and Phi4MM ragged image mask handling (#43647)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Perf] Optimize Fp8BlockScaledMMLinearKernel input_scale tensor using new_empty() (#43677)

Signed-off-by: Xin Yang <xyangx@amazon.com>

* [Attention] Make FlexAttention and FlashAttention use num-blocks first layouts (#42095)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

* [MLA][Attention] Add OOT MLA prefill backend registration mechanism (#43325)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

* [Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [DSv4] Refactor compressor & Fix ROCm compatibility (#43710)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Fix test_aot_compile for torch 2.12 (#43695)

Signed-off-by: Angela Yi <yiangela7@gmail.com>

* [KVConnector][Mooncake] Wire reset_cache cascade end-to-end (#42694)

Signed-off-by: aoshen524 <aoshen524@gmail.com>
Signed-off-by: Ao Shen <aoshen@inferact.ai>
Co-authored-by: aoshen524 <aoshen524@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [ROCm][Perf] Expose AITER MoE sorting dispatch policy via env var (#39177)

Signed-off-by: nholmber <nholmber@users.noreply.github.com>

* [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [Frontend] Add MiniCPM5 XML tool call parser (#43175)

Signed-off-by: zhangtao <zhangtao2@modelbest.cn>
Signed-off-by: zhangtao2 <zhangtao2@modelbest.cn>
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [ROCm][GPT-OSS] Avoid repeated compile-time `cos_sin_cache.to(bf16)` casts in rotary path (#42833)

Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>

* [Doc] Add Ascend NPU tab to the quickstart installation guide (#43550)

Signed-off-by: Aditya Singh <adisin650@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Rust Frontend] Align tool parser fallback behavior between streaming & non-streaming paths (#43662)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Docs] Fix MLA prefill backend default docs (#43697)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Kernel] Enable TritonW4A16LinearKernel as CUDA fallback for non-Marlin-aligned W4A16 shapes (#43731)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [misc] Bump cutedsl version to 4.5.2 (#43745)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1 (#39155)

Signed-off-by: Injae Ryou <injaeryou@gmail.com>

* [Docs] Fix the duplicate doc icon issue (#43546)

Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>

* Fix early CUDA init (#43791)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [ROCm] mori: add InterNodeV1LL inter-node kernel selection via VLLM_MORI_INTERNODE_KERNEL (#41751)

Signed-off-by: jatseng-ai <jatseng@amd.com>

* [8/n] Migrate merge_attn_states, mamba, sampler to torch stable ABI (continued) (#43361)

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [Quantization] Fix Humming RoutedExperts import (#43540)

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* Remove Transformers forward/backward compatibility tests (#43785)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Validate against some config fields being set to 0 (#43794)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix][DFlash]allocate the proper number of lookahead slots (#43733)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>

* Fix Qwen3-VL and Qwen3-omni-thinker accuracy degradation from deepstack inputs under torch.compile (#43617)

Signed-off-by: Dakai An <dakaian108@gmail.com>

* Add @AndreasKaratzas to CODEOWNERS (#43740)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Bugfix][Kernel] TRTLLM NVFP4 MoE chunking (#43599)

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

* [ModelRunnerV2][Hybrid model] Support kernel block size in hybrid model (#38831)

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Rust Frontend] Introduce mock engine for benchmark baseline (#43469)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* Fix RunAI streamer tensor buffer reuse during weight loading (#43464)

Signed-off-by: bbartels <benjamin@bartels.dev>

* [MoE] Remove inplace fused experts mechanism (#43727)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [Misc][Rocm] Remove redundant `AiterUnifiedAttentionBackend` block size log (#43664)

Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [ROCm][CI] Stabilize Cargo cache and pre-test image checks (#43815)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* fix: parse Qwen3 XML JSON arguments first (#43243)

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>

* [Bugfix] Pass `routed_scaling_factor` to FlashInfer TRTLLM BF16 MoE (#43769)

* [BugFix] Fix blocked reasoning parsing with MRV2 (#43808)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Bugfix][Frontend] streaming tool-call serializer drops first args chunk when name and args share a DeltaMessage  (#42683)

Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>

* minor docs: fix incorrect example path (#43830)

Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>

* [ROCm][DSV4] Enable Tilelang MHC replacing torch/triton mhc (#43679)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* change name of fs_python secondary tier to fs. (#43600)

Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>

* [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Kernel] Marlin MoE: include SM 12.x in default arch list (#40923)

Signed-off-by: Tony Liu <tonyliu0512@gmail.com>
Co-authored-by: Tony Liu <tonyliu0512@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [DSV4] Remove AMD/XPU path in deepseek_v4/nvidia (#43829)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Restore `Literal` for `WeightTransferConfig.backend` (#43183)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix] Stream DeepSeek DSML tool-call argument deltas incrementally (#42879)

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [ROCm][CI] Move workload from MI300 to MI325 (#43824)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Feature] Add support for timed trace replay in `vllm bench serve` to replay Moonshot and Alibaba workload traces (#39795)

Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>

* [UX] Increase DP Coordinator startup timeout from 30s to 120s (#42343)

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

* [Model][Bugfix] Rename weight_mapper to hf_to_vllm_mapper in LlamaNemotronVL pooling models (#43581)

Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: opencode <noreply@opencode.ai>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>

* [Bugfix][ROCm] Fix Accuracy Drop in Sparse Indexer on gfx950 (#43781)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

* [Bugfix] Fix HyperCLOVAX CI failure after upstream removed remote code (#43860)

Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [CI] Auto-apply `rust` label to relevant PRs (#43866)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Feature] Add structured output and effort support to Anthropic Messages API (#42396)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* Log dummy DP step in iteration details (#41406)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

* [EC Connector] Add shutdown API to EC Connector. (#42423)

Signed-off-by: omerpaz95 <omerpaz95@gmail.com>

* Fix `OlmoHybridForCausalLM` not initialising (#43846)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [BUGFIX] Multimodal benchmark with MistralTokenizer (#42965)

Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* [Perf] Optimize moe permute by pre-allocate buffer, 9~14% kernel performance improvement (#43014)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Perf][KDA] Fuse gate softplus, chunk-local cumsum, and RCP_LN2 scaling (#43667)

Signed-off-by: haojiangzheng <justineric096@gmail.com>
Co-authored-by: haojiangzheng <justineric096@gmail.com>

* Add token-offset based selective offload in OffloadConnector (#39983)

Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>

* [Model Refactoring] Remove torch compile dependency in DSv4 (#43746)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Bugfix][ROCm] Resolve MoRI connector hangs at high concurrency (#40344)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

* [CPU] Migrate cpu_awq into awq_marlin (#43841)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Rust Frontend] Add `hy_v3` tool parser (#43872)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Rust Frontend] Reduce Gemma4 tool parser args scan complexity (#43850)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [rust] fix: aggregate `is_sleeping` and `reset_prefix_cache` across DP engines (#43429)

Signed-off-by: Will.hou <1205157517@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Bug] Fix `tests/distributed/test_elastic_ep.py  - assert False` (#43813)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Perf] Add do_not_specialize to Mamba SSD chunk kernels (#43803)

Signed-off-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: Majid Taheri Andani <tahemaji@amazon.com>

* [Bugfix] Exclude Ray DP from #42585's deferred port allocation (#43864)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

* [KV Offload] Rename `SecondaryTierManager.get_finished()` to `get_finished_jobs()` (#43870)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

* [ROCm][Perf] Support N=5 in wvSplitK skinny GEMM kernels for speculative decoding (#40687)

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

* [XPU][MoE] Add WNA16 oracle backend for GPTQ sym-int4 (xpu_fused_moe) (#41426)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [ROCm] Bump ROCm to 7.2.3 (#43136)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

* Add Cosmos3 Reasoner model (#43356)

Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>

* [Rust Frontend] Optimize multimodal prompt expansion (#43670)

Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>

* Allow native KV cache dtype in Triton cache update (#43330)

Signed-off-by: Michael Gschwind <mgschwind@nvidia.com>
Co-authored-by: Michael Gschwind <mgschwind@nvidia.com>

* [Attention][AMD] Standardize kv layout to blocks first for AMD (#43660)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [ROCm] Enable the aiter top-k/top-p sampler by default (#43331)

Signed-off-by: John Qin <yanyuan.qin@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

* [MM][CG] Avoid over-padding Qwen2.5-VL encoder cudagraph window metadata (#42796)

Signed-off-by: Hua Huang <huah@nvidia.com>

* Deprecate `JAISLMHeadModel` (#43784)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Feat] Add support for per GPU worker RDMA NIC selection (#42083)

Signed-off-by: Raj Joshi <rajjoshi@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* [Core] Cleanup KVConnector handling with PP + fix MRV2  (#43732)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [KV Offload] Add per-request offloading policy via `on_new_request` lifecycle hook (#43205)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Model Refactoring] Remove unncessary torch op registration for DSv4 (#43891)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Spec Decode] Allow causal DFlash (#43445)

* Refactor output filename handling in ci-fetch-log.sh (#43901)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* [AMD][CI][BugFix] Fix  Distributed Compile Unit Tests (2xH100-2xMI300) group (#43120)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* fix(frontend): Add multimodal placeholders to Gemma4 tool message template (#41459)

Signed-off-by: Harshal Janjani <harshaljanjani@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>

* [CI] Enable prefix caching in BFCL benchmark (#43925)

Signed-off-by: Yifan Zong <yzong@redhat.com>

* [Model]Support Step-3.7-Flash (#43859)

Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>

* [Rust Frontend] Add `/version` endpoint using engine-reported value (#43854)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Misc][NUMA] Auto-bind to PCT priority cores on DGX B300 + widen EngineCore across shard NUMA nodes (#43270)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Cursor <noreply@cursor.com>

* [DSv4] Move mHC tilelang kernels & Don't use CustomOP in dsv4/nvidia (#43905)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [feat] add GlmgaProcessor specific logits in `glm4_1v.py` (#43575)

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>

* Adjust design around encoder_cudagraph_forward (#42288)

Signed-off-by: Weida Hong <wdhongtw@google.com>

* [XPU] add scale transpose to prepare_fp8_moe_layer_for_xpu and bump up kernels (#43277)

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [kv_offload] Skip decode-phase blocks in CPU offload (#43797)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>

* [Refactor] Remove dead code (#43234)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [9/n] Migrate attention and cache kernels to torch stable ABI (continued)  (#43717)

Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [CI] Separate non-root smoke tests from image build step (#43712)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [XPU] add gelu_tanh to xpu moe backend supported activations (#42822)

Signed-off-by: yintong-lu <yintong.lu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [CPU Backend] CPU top-k and top-p sampling kernels using Triton (#43633)

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [ROCm][DSv4] Remove device pipeline stall in sparse attention (#43898)

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

* [Frontend]Responses API supports chat_template_kwargs (#43761)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [ROCm][CI] Fix AITER unified attention for encoder-decoder cross-attention (#43945)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [XPU] fix xpu install document triton-xpu version (#43947)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* [CI][ROCm] Don't skip MoRI-IO Connector tests (#43703)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

* [XPU] support MTP of gdn attention (#43565)

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [CI] Nixl+SimpleCPUOffloadingConnector unit tests (#43871)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Bugfix] Fix Step3 pipeline parallel KeyError for residual tensor (#37622)

Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

* [Kernel][ROCm] Native W4A16 kernel for AMD RDNA3 (gfx1100) — fp16 + bf16 (#41394)

Signed-off-by: JartX <sagformas@epdcenter.es>

* [Bugfix] [ROCm] [DSV4] Fix AITER MXFP4 MoE weight loading and shuffle… (#42595)

Co-authored-by: MHYangAMD <MHYangAMD@users.noreply.github.com>

* [ROCm][Perf] DSv3.2 MI355X TP4 decode-step orchestration cleanup (3 micro-opts) (#42982)

Signed-off-by: Frida Andersson <fanderss@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* [Bugfix] Corrupted MLA + linear attention (#43961)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* Skip docs build if PR doesn't affect docs (#43972)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix][CPU] Remove invalid extra deps (#43977)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* Add vLLM library info to Hugging Face Hub requests (#43857)

Signed-off-by: Wauplin <lucainp@gmail.com>
Signed-off-by: Lucain Pouget <lucain@huggingface.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* docs: clarify ITL acronym in optimization docs (#43922)

Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>

* [Misc] added unit tests for the core pooling methods (#43818)

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [Bugfix] Disable allreduce_rms_fusion when pipeline_parallel_size > 1 (#43616)

Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude <noreply@anthropic.com>

* [MoE Refactor] WNA16 MoE backend selection into oracle module (#42553)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [EPLB] Make async EPLB default (#43219)

Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Use storage_block_size in KV cache reshape for compressed specs (DeepSeek V4) (#43988)

Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [Bugfix] Fix Ray placement group allocation with grouped nodes (#43998)

Signed-off-by: <conway.zhu@cohere.com>
Signed-off-by: root <conway.zhu@cohere.com>

* [Bug] Fix torch device issue for MOE permute (#44005)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [CI] Make Model Executor test hangs fail fast with a traceback (#43971)

Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [CI] Remove redundant test_chat_with_tool_reasoning.py (#44011)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* Add @khluu to CODEOWNERS (#44019)

Signed-off-by: Kevin H. Luu <khluu000@gmail.com>

* [Feature] SSL support for dp supervisor (#43688)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Metrics] Exclude KV transfer tokens from iteration_tokens_total (#43346)

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Fronten] Clean up stop_token_ids override for Harmony (#44009)

Signed-off-by: Yifan Zong <yzong@redhat.com>

* [MoE Refactor] Migrate MoeWNA16Method quantization to MK oracle (#42647)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [MoE Refactor] Remove supports_expert_map (#43108)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [CI] Remove duplicate Harmony test coverage (#44023)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [CI] Fix smoke test step key to bypass block gate (#43974)

Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "[MoE Refactor] Migrate MoeWNA16Method quantization to MK orac… (#44033)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [PERF]MiniMax-M2 gate kernel (#38445)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>

* offload prompt_embeds decode in render_prompts_async to avoid blocking (#43792)

Signed-off-by: Gagan Dhakrey <gagandhakrey@gmail.com>

* [Refactor] Remove dead current_tool_name_sent assignments from tool parsers (#43997)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [ROCm][CI] Fix failure in the Phi3V pooling test (#44028)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [ROCm] cmake: support PYTORCH_FOUND_HIP for torch 2.13 native HIP language support (#43881)

Signed-off-by: nemanjaudovic <nudovic@amd.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [BugFix][Platform] Fix import vllm.platforms.rocm error on non-CUDA test_gpt_oss.py (#43571)

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [Bugfix] Fix RMSNorm kernels to multiply in weight's native dtype (#42379)

Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [ROCm] Add attention sink support to AITer flash attention backend (#43817)

Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>

* [Governance] Add @BugenZhao as Rust frontend code owner (#44047)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Bug] Fix gemma4 MTP IMA issue when TP>1, `CUDA error: an illegal memory access was encountered` (#43909)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [MRV2] Support breakable CUDA graph (#44050)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [CPU][Zen] Route W8A8 and W4A16 linear inference through zentorch on AMD Zen CPUs (#41813)

Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>

* [CI/Build] Enable Step3p7ForConditionalGeneration testing (#43956)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* docs: fix MLA attention docstring examples (#44118)

Co-authored-by: nightcityblade <nightcityblade@gmail.com>

* [Misc] Use VLLMValidationError consistently in chat completion and completion protocol validators (#36254)

Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>

* [MRV2] Remove Eagle's dedicated CUDA graph pool (#44078)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

* [BugFix] Fix `_has_module` to verify native deps via trial import (#44035)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Docs] Replace broken video url in examples (#44159)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [CPU][RISC-V] Add missing RVV cpu_types helpers for WNA16 (#42730)

Signed-off-by: wcy <233313160abc@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>

* fix: glm5.1 pp model loading (#42944)

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

* [Frontend] Resettle generative scoring entrypoint. (#44153)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

* [Rust Frontend] Add InternLM2 tool parser (#43481)

Signed-off-by: Will.hou <1205157517@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>

* [Bugfix] fix wrong partial_rotary_factor calculation for bailing_moe model. (#43770)

Signed-off-by: zzt <zengzetang.zzt@antgroup.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>

* [XPU][CI] Fix test_audio_in_video flake by using module-scoped server fixture (#44146)

Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>

* [Perf] Optimize cutlass fp8 scaled mm bypassing padding, 20% kernel performance improvement (#43706)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)

Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [Kernel][DSv4] Optimize sparse FP8 compressor kernels (#44161)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [ROCm][CI] Fix and stabilize EAGLE3 acceptance tests (#41294)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>

* [Rust Frontend] Support streaming `generate` endpoint (#43779)

Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>

* [Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>

* [BugFix][CI] Fix added `_has_module` tests (#44248)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Test][BugFix] Fix double-BOS in PD+specdec acceptance test (#44234)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [DSV4] Remove unncessary classes & functions (#44246)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [ROCm][CI] Skip unbacked dynamic shapes tests on PyTorch < 2.11 (#44256)

Signed-off-by: JartX <sagformas@epdcenter.es>

* [DSV4] Refactor RoPE initialization (#44262)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Bugfix][Mooncake] Release GPU pin on failed store in MooncakeStoreConnector (#43742)

Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [ROCm] Upgrade AITER to v0.1.13.post1 (#44265)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

* [Bugfix][CI] Normalize NIXL connector CUDA wheel installs (#44266)

Signed-off-by: Alec Flowers <aflowers@nvidia.com>

* [Refactor] Move unstreamed tool-arg flush from serving layer to parser (#44017)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [CI] Stabilize OpenAI schema fuzzing for malformed structural tags (#44131)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [BugFix] Fix TypeError in MiniCPM-O audio feature unpadding (#38053)

Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Signed-off-by: wjinxu <1299461899@qq.com>
Signed-off-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Co-authored-by: wjinxu <1299461899@qq.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Kc Balusu <kcbalusu@users.noreply.github.com>

* [BugFix][kv_offload]: Prevent offloading stale sliding window blocks (#42959)

Signed-off-by: Or Ozeri <oro@il.ibm.com>

* [XPU][Bugfix] Fix per_token_group_fp8_quant missing dummy args on XPU (#43930)

Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [MM][CG] Profile encoder CUDA graph pool memory (#41714)

Signed-off-by: JooHo Lee <jooho414@gmail.com>

* [Bugfix] Convert Gemma4-MM ViT linear layers to vllm native impl (#43798)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>

* [Model Runner V2] Support zeroing freshly allocated KV blocks for hybrid + fp8 KVCache (#43990)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

* [Model Runner V2] Use actual batch max_seq_len for attn metadata (#43991)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

* [Refactor] Unify reasoning + tool-call parsing behind Parser.parse() (#44267)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

---------

Signed-off-by: Hua Huang <huah@nvidia.com>
Signed-off-by: holegots <ikun3.1415927@gmail.com>
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: ThibaultCastells <thib.castells@icloud.com>
Signed-off-by: linzm1007 <linzm1007@126.com>
Signed-off-by: Jesus De Jesus <dejesus.9297@gmail.com>
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Kevin Luu <kevin@inferact.ai>
Signed-off-by: Zhewen Li <zhewen@inferact.ai>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Angela Yi <yiangela7@gmail.com>
Signed-off-by: aoshen524 <aoshen524@gmail.com>
Signed-off-by: Ao Shen <aoshen@inferact.ai>
Signed-off-by: nholmber <nholmber@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: zhangtao <zhangtao2@modelbest.cn>
Signed-off-by: zhangtao2 <zhangtao2@modelbest.cn>
Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
Signed-off-by: Aditya Singh <adisin650@gmail.com>
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: jatseng-ai <jatseng@amd.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Signed-off-by: Dakai An <dakaian108@gmail.com>
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>
Signed-off-by: Tony Liu <tonyliu0512@gmail.com>
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: haojiangzheng <justineric096@gmail.com>
Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Will.hou <1205157517@qq.com>
Signed-off-by: Majid Taheri Andani <tahemaji@amazon.com>
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Michael Gschwind <mgschwind@nvidia.com>
Signed-off-by: John Qin <yanyuan.qin@amd.com>
Signed-off-by: Raj Joshi <rajjoshi@redhat.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Harshal Janjani <harshaljanjani@gmail.com>
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Weida Hong <wdhongtw@google.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: yintong-lu <yintong.lu@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Frida Andersson <fanderss@amd.com>
Signed-off-by: Wauplin <lucainp@gmail.com>
Signed-off-by: Lucain Pouget <lucain@huggingface.co>
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: zixi-qi <zixi@inferact.ai>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: <conway.zhu@cohere.com>
Signed-off-by: root <conway.zhu@cohere.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Signed-off-by: Gagan Dhakrey <gagandhakrey@gmail.com>
Signed-off-by: nemanjaudovic <nudovic@amd.com>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: wcy <233313160abc@gmail.com>
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Signed-off-by: zzt <zengzetang.zzt@antgroup.com>
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Signed-off-by: wjinxu <1299461899@qq.com>
Signed-off-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Signed-off-by: JooHo Lee <jooho414@gmail.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Hynek Kydlicek <kydlicek.hynek@gmail.com>
Co-authored-by: Hua Huang <huangh1994@outlook.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Holegots <fuergaosi@gmail.com>
Co-authored-by: Siddharth Bedekar <104613085+bedeks@users.noreply.github.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Dao007forever <dao007forever@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>
Co-authored-by: Rotem Shavitt <rshavitt@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nguyễn Thế Duy <dtnguyen@nvidia.com>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Roy Wang <jasonailu87@gmail.com>
Co-authored-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Zhewen Li <zhewenli@meta.com>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Huanyu Yang <20242081160@mail.dlut.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: zhao, zhenhui <zhenhui.zhao@intel.com>
Co-authored-by: Sting Lin <sting.lin@cienet.com>
Co-authored-by: Jie Fang <jief@nvidia.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Hank_ <37239608+ILikeIneine@users.noreply.github.com>
Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Ethan Feng <ethan.fengch@gmail.com>
Co-authored-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: Thibault Castells <38716394+ThibaultCastells@users.noreply.github.com>
Co-authored-by: linzm1007 <96732179+linzm1007@users.noreply.github.com>
Co-authored-by: Javier De Jesus <javier.dejesusj9@gmail.com>
Co-authored-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Luciano Martins <22145370+lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Angela Yi <yiangela7@gmail.com>
Co-authored-by: aoshen02 <aoshen@inferact.ai>
Co-authored-by: aoshen524 <aoshen524@gmail.com>
Co-authored-by: Nico Holmberg <nico.holmberg@amd.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: zhangtao2-1 <478679312@qq.com>
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: akii96 <aakif.nawaz@amd.com>
Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com>
Co-authored-by: Ashwin Giridharan <ashwing@users.noreply.github.com>
Co-authored-by: Injae Ryou <injaeryou@gmail.com>
Co-authored-by: Chunyang Wen <chunyang.wen@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: jatseng-ai <jatseng@amd.com>
Co-authored-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Minh Vu <vuhoangminh97@gmail.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Dakai An <77474977+andakai@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>
Co-authored-by: JINO ROHIT <find.jinorohit@gmail.com>
Co-authored-by: tonyliu312 <56969792@qq.com>
Co-authored-by: Tony Liu <tonyliu0512@gmail.com>
Co-authored-by: jack <QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: Animesh Trivedi <animesh.trivedi@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Co-authored-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: opencode <noreply@opencode.ai>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: kliuae <17350011+kliuae@users.noreply.github.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: omerpaz95 <73347585+omerpaz95@users.noreply.github.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: zexplorerhj <zhjoneson@163.com>
Co-authored-by: haojiangzheng <justineric096@gmail.com>
Co-authored-by: Angelo Ruocco <angeloruocco90@gmail.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Will.hou <1205157517@qq.com>
Co-authored-by: Majid <mjtaheri68@gmail.com>
Co-authored-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Jason Elie Bou Kheir <5115126+jasonboukheir@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Chao-Ju Chen <ricky.chen@infinirc.com>
Co-authored-by: Mike G <180722391+mikekg@users.noreply.github.com>
Co-authored-by: Michael Gschwind <mgschwind@nvidia.com>
Co-authored-by: JohnQinAMD <yanyuan.qin@amd.com>
Co-authored-by: Hua Huang <huah@nvidia.com>
Co-authored-by: Raj Joshi <rajjoshi@g.harvard.edu>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Harshal Janjani <harshaljanjani@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: yzong-rh <yzong@redhat.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: Cursor <noreply@cursor.com>
Co-authored-by: Jared Wen <w13431838023@gmail.com>
Co-authored-by: Weida Hong <wdhongtw@google.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Yintong Lu <yintong.lu@intel.com>
Co-authored-by: Tianmu Li <tianmu.li@intel.com>
Co-authored-by: Joaquín Mondéjar <111321569+JMonde@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: MHYangAMD <meng-hsuan.yang@amd.com>
Co-authored-by: MHYangAMD <MHYangAMD@users.noreply.github.com>
Co-authored-by: frida-andersson <fanderss@amd.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
Co-authored-by: Ilya Markov <markovilya197@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: czhu-cohere <conway.zhu@cohere.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Gagan Dhakrey <59848316+gagandhakrey@users.noreply.github.com>
Co-authored-by: nemanjaudovic <152565955+nemanjaudovic@users.noreply.github.com>
Co-authored-by: Liangliang Ma <liangliang.ma@intel.com>
Co-authored-by: Lanze Liu <86434077+liulanze@users.noreply.github.com>
Co-authored-by: Xiaoran <claire.rrchen@hotmail.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Aakar Dwivedi <82587125+aadwived@users.noreply.github.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: nightcityblade <jackchen@haloailabs.com>
Co-authored-by: nightcityblade <nightcityblade@gmail.com>
Co-authored-by: Umut Polat <52835619+umut-polat@users.noreply.github.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: wcy <86111164+wcynb1023@users.noreply.github.com>
Co-authored-by: Uranus <109661872+UranusSeven@users.noreply.github.com>
Co-authored-by: zzt <mf1732009@smail.nju.edu.cn>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Xunzhuo <xunzhuo@vllm-semantic-router.ai>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Krishna Chaitanya <krishnabkc15@gmail.com>
Co-authored-by: wjinxu <1299461899@qq.com>
Co-authored-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Co-authored-by: JooHo Lee <96564470+BWAAEEEK@users.noreply.github.com>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…m-project#43991)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1 v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants