Skip to content

[ROCm][Bugfix][GPTOSS]: fix input_ids and expert_map args for quark w4a8 gptoss#41165

Merged
gshtras merged 4 commits intovllm-project:mainfrom
ROCm:fix_gptoss_w4a8_loading
Apr 29, 2026
Merged

[ROCm][Bugfix][GPTOSS]: fix input_ids and expert_map args for quark w4a8 gptoss#41165
gshtras merged 4 commits intovllm-project:mainfrom
ROCm:fix_gptoss_w4a8_loading

Conversation

@Rohan138
Copy link
Copy Markdown
Contributor

@Rohan138 Rohan138 commented Apr 28, 2026

Purpose

#40860 added the input_ids arg to QuarkOCP_MX_MoEMethod.apply_monolithic, but not to QuarkOCP_MX_MoEMethod_OSS.apply_monolithic, leading to this error:

VLLM_ROCM_USE_AITER=1 vllm serve amd/gpt-oss120b-w-mxfp4-a-fp8 --dtype auto -tp 8 --no-enable-prefix-caching --disable-uvicorn-access-log --block-size 64

...

[SERVER] (EngineCore pid=1921) Process EngineCore:
[SERVER] (EngineCore pid=1921) Traceback (most recent call last):
[SERVER] (EngineCore pid=1921)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
[SERVER] (EngineCore pid=1921)     self.run()
[SERVER] (EngineCore pid=1921)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
[SERVER] (EngineCore pid=1921)     self._target(*self._args, **self._kwargs)
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1140, in run_engine_core
[SERVER] (EngineCore pid=1921)     raise e
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1110, in run_engine_core
[SERVER] (EngineCore pid=1921)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
[SERVER] (EngineCore pid=1921)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[SERVER] (EngineCore pid=1921)     return func(*args, **kwargs)
[SERVER] (EngineCore pid=1921)            ^^^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 876, in __init__
[SERVER] (EngineCore pid=1921)     super().__init__(
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 128, in __init__
[SERVER] (EngineCore pid=1921)     kv_cache_config = self._initialize_kv_caches(vllm_config)
[SERVER] (EngineCore pid=1921)                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
[SERVER] (EngineCore pid=1921)     return func(*args, **kwargs)
[SERVER] (EngineCore pid=1921)            ^^^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 250, in _initialize_kv_caches
[SERVER] (EngineCore pid=1921)     available_gpu_memory = self.model_executor.determine_available_memory()
[SERVER] (EngineCore pid=1921)                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 147, in determine_available_memory
[SERVER] (EngineCore pid=1921)     return self.collective_rpc("determine_available_memory")
[SERVER] (EngineCore pid=1921)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 403, in collective_rpc
[SERVER] (EngineCore pid=1921)     return future if non_block else future.result()
[SERVER] (EngineCore pid=1921)                                     ^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 90, in result
[SERVER] (EngineCore pid=1921)     return super().result()
[SERVER] (EngineCore pid=1921)            ^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
[SERVER] (EngineCore pid=1921)     return self.__get_result()
[SERVER] (EngineCore pid=1921)            ^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
[SERVER] (EngineCore pid=1921)     raise self._exception
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
[SERVER] (EngineCore pid=1921)     response = self.aggregate(self.get_response())
[SERVER] (EngineCore pid=1921)                               ^^^^^^^^^^^^^^^^^^^
[SERVER] (EngineCore pid=1921)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
[SERVER] (EngineCore pid=1921)     raise RuntimeError(
[SERVER] (EngineCore pid=1921) RuntimeError: Worker failed with error 'QuarkOCP_MX_MoEMethod_OSS.apply_monolithic() got an unexpected keyword argument 'input_ids'', please check the stack trace above for the root cause

cc @BowenBao @fxmarty-amd

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 requested a review from tjtanaa as a code owner April 28, 2026 20:06
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm bug Something isn't working labels Apr 28, 2026
@github-project-automation github-project-automation Bot moved this to Todo in AMD Apr 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the apply_monolithic method in vllm/model_executor/layers/quantization/quark/quark_moe.py by refining the layer type hint to FusedMoE and replacing the expert_map parameter with input_ids. The implementation now correctly references layer.expert_map internally. Feedback was provided to correct the return type annotation of apply_monolithic to torch.Tensor to ensure consistency with the actual return value and the base class definitions.

Comment thread vllm/model_executor/layers/quantization/quark/quark_moe.py Outdated
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@github-project-automation github-project-automation Bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Apr 28, 2026
@gshtras gshtras enabled auto-merge (squash) April 28, 2026 20:57
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 28, 2026
@gshtras gshtras merged commit 3795d7a into vllm-project:main Apr 29, 2026
66 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in AMD Apr 29, 2026
zyongye pushed a commit to zyongye/vllm that referenced this pull request May 1, 2026
…4a8 gptoss (vllm-project#41165)

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
…4a8 gptoss (vllm-project#41165)

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Adrian <info@zzit.ch>
@gshtras gshtras added this to the v0.20.1 milestone May 1, 2026
khluu pushed a commit that referenced this pull request May 1, 2026
…4a8 gptoss (#41165)

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
(cherry picked from commit 3795d7a)
hnt2601 pushed a commit to hnt2601/vllm that referenced this pull request May 2, 2026
…4a8 gptoss (vllm-project#41165)

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
RhizoNymph added a commit to RhizoNymph/vllm that referenced this pull request May 2, 2026
commit 8586369f617a964235d0d9d32d6ebb1076a4581d
Author: Matthew Santiago <carag.matthew@gmail.com>
Date:   Sat May 2 01:22:14 2026 -0500

    Refactor Step3Text loading to use AutoWeightsLoader (#41492)

    Signed-off-by: Matthew Santiago <carag.matthew@gmail.com>

commit ae3b4deb8a5987759d4732e67767146a46ee72ed
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Sat May 2 13:27:43 2026 +0800

    [Doc] Add Codex usage example (#41358)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

commit c293ccc58ef6e1a0976a62f79f57bc045108073d
Author: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com>
Date:   Fri May 1 21:13:15 2026 -0700

    [ROCm][Bugfix] Fix init-time bias dtype cast when gate.out_dtype is None (#41405)

    Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>

commit d58c42e19cb792e24eb335b75164356a4f71bff0
Author: Luka Govedič <ProExpertProg@users.noreply.github.com>
Date:   Fri May 1 23:41:15 2026 -0400

    [vLLM IR] 2/N fused_add_rms_norm and maybe_inplace overload (#36823)

    Signed-off-by: Luka Govedič <lgovedic@redhat.com>
    Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>

commit 3e49479c4b766a601804f0c6f5f1c9a3def5ad0c
Author: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Date:   Fri May 1 23:19:07 2026 -0400

    Limit concurrency on `test_transcription_api_correctness.py` (#41478)

    Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

commit 964a4bc2a57aca2a42d04538b27cab4d333d0f5d
Author: John Calderon <81483067+johncalesp@users.noreply.github.com>
Date:   Fri May 1 23:10:14 2026 -0400

    [MM][CG] Support ViT CG for Qwen2.5-VL (#40830)

    Signed-off-by: John Calderon <jcalderon@nvidia.com>

commit c408fdd663afb34ab82a10b26f553bec9e8052d9
Author: FredericOdermatt <50372080+FredericOdermatt@users.noreply.github.com>
Date:   Sat May 2 05:06:54 2026 +0200

    [Fix] Sync gemma4 chat template from hf (#39570)

    Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>

commit 5737770c6c346d918fdfb13e9378f9514f616186
Author: Andy Lo <andy@mistral.ai>
Date:   Sat May 2 00:01:37 2026 +0100

    Re-enable allreduce rms fusion for DP / PP (#41458)

    Signed-off-by: Andy Lo <andy@mistral.ai>

commit 0c99629ede51524f00b88cb758c895fd76a5f6f9
Author: Michael Goin <mgoin64@gmail.com>
Date:   Fri May 1 17:45:03 2026 -0400

    [Build] Make bundled DeepGEMM wheel portable across Python versions (#41476)

    Signed-off-by: mgoin <mgoin64@gmail.com>

commit edd60ac93a3247c7ef1bf1e2a3e9c0e95bc83bf6
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Fri May 1 17:42:52 2026 -0400

    [Bugfix] Fix persistent_topk inter-CTA init race on RadixRowState (#41444)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

commit bcf5cac9fb956788f649d1f5297b74c886a9d6d3
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Fri May 1 15:23:17 2026 -0400

    [DSV4] Add knob to enable pre-attn gemm  (#41443)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

commit a9484dac7b734096ed26db4902454da7e497d2c3
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Sat May 2 03:01:17 2026 +0800

    [Perf] Intergrate Tile Kernels `head_compute_mix_kernel` for Deepseek-V4 (#41255)

    Signed-off-by: Isotr0py <Isotr0py@outlook.com>
    Co-authored-by: Roger Wang <hey@rogerw.io>

commit f3fef123504db07b3ac83ad4ef677915b53e8386
Author: Matthew Bonanni <mbonanni@redhat.com>
Date:   Fri May 1 13:36:20 2026 -0400

    [Attention] Abstract the MLA prefill backends and eliminate cuDNN (#32623)

    Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
    Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
    Co-authored-by: Michael Goin <mgoin64@gmail.com>
    Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 51295793a2eed0eefc7505cb9a7d5f96effd7773
Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Date:   Fri May 1 13:02:03 2026 -0400

    [Model Runner V2] Add `logprob_token_ids` support (#40559)

    Signed-off-by: yewentao256 <zhyanwentao@126.com>
    Signed-off-by: Nick Hill <nickhill123@gmail.com>
    Co-authored-by: Nick Hill <nickhill123@gmail.com>

commit 3ccc1ff4958dd07dbffeaa1c48463325c892b518
Author: Michael Goin <mgoin64@gmail.com>
Date:   Fri May 1 12:00:38 2026 -0400

    [Eval][CI] Add basic mrcr eval to tests/evals/ (#40164)

    Signed-off-by: mgoin <mgoin64@gmail.com>

commit 529c671e8075d265a48b72e0eaaeb5e30d2f1630
Author: vllmellm <vllm.ellm@embeddedllm.com>
Date:   Fri May 1 23:07:18 2026 +0800

    [ROCm][FEAT] AITER Fused Allreduce + RMSNorm (#37646)

    Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
    Signed-off-by: Rita Brugarolas Brufau <rita.brugarolasbrufau@amd.com>
    Signed-off-by: junkang1991 <junkangchow@gmail.com>
    Co-authored-by: Rita Brugarolas <Rita.BrugarolasBrufau@amd.com>
    Co-authored-by: junkang1991 <junkangchow@gmail.com>
    Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
    Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

commit bc635fad2389e228a31d6bc6e698caf53d395e13
Author: Pleaplusone <ygan@amd.com>
Date:   Fri May 1 22:06:00 2026 +0800

    [ROCm][Deepseek] dsv3.2 further optimization (#41217)

    Signed-off-by: ganyi <ygan@amd.com>
    Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
    Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>

commit c3e64696cdea5df92eb15e20b18ba979a536c1e3
Author: Artem Perevedentsev <aperevedents@nvidia.com>
Date:   Fri May 1 17:04:11 2026 +0300

    [Perf] Warmup forward_native sampler kernel (#41375)

    Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
    Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

commit 4f7bde572ad05a6c013dde5f8874898fff3c1253
Author: sungsoo ha <sungsooh@nvidia.com>
Date:   Fri May 1 06:01:17 2026 -0700

    [Kernel] Pack output and LSE in DCP A2A (#41160)

commit 2fa1f8ec00cf85b15422cb4c0e8eb3632ee13ea8
Author: Or Ozeri <oro@il.ibm.com>
Date:   Fri May 1 14:30:03 2026 +0300

    [kv_offload+HMA][13/N]: Enable HMA support (#41445)

    This is the final PR in a series to enables HMA support for the
    offloading connector. The connector advertises `SupportsHMA`
    and is validated with unit tests and e2e tests.

    Signed-off-by: Or Ozeri <oro@il.ibm.com>

commit 7075df79b3094bb6f6d28021c4df8631af10b2b8
Author: raviguptaamd <ravi.gupta@amd.com>
Date:   Fri May 1 02:18:30 2026 -0700

    [ROCm] Enable DBO (Dynamic Batch Optimization) on ROCm (#34726)

    Signed-off-by: raviguptaamd <ravi.gupta@amd.com>

commit 0dbaf9daad2031235344428d2a574496bb4d9a3b
Author: Yuyi Ao <yuyiao772@gmail.com>
Date:   Fri May 1 05:07:23 2026 -0400

    Refractor longcat loading to use AutoWeightsLoader (#41448)

    Signed-off-by: George-ao <yuyiao772@gmail.com>

commit a3ec4a35f5943c250974d504706d22297d423468
Author: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Date:   Fri May 1 00:43:39 2026 -0700

    [Bugfix][Metrics] Fix RayPrometheusMetric.labels() returning shared labeled child (#40840)

    When vLLM runs with Ray Prometheus `vllm:request_success{finished_reason=...}`
    only ever increments the repetition bucket regardless of the request's actual finish
    reason; stop, length, abort, and error stay at zero. Root cause was `labels()` mutated
    the wrapped Ray metric's default tags in place and returned self, so every `.labels(...)`
    call on a given wrapper returned the same object.

    Co-authored-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
    Co-authored-by: Claude <noreply@anthropic.com>
    Signed-off-by: Marwan Sarieddine <sarieddine.marwan@gmail.com>
    Signed-off-by: Seiji Eicher <seiji@anyscale.com>

commit 32964e770041fa4124f98c66efb3ff721ca608b6
Author: Andreas Karatzas <akaratza@amd.com>
Date:   Fri May 1 02:40:47 2026 -0500

    [ROCm][CI] Upgraded UCX and RIXL (#41210)

    Signed-off-by: Andreas Karatzas <akaratza@amd.com>

commit a07642667db1284ad2128d7f9ef089e6b0d24a4c
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Fri May 1 14:38:02 2026 +0800

    [Bugfix] Pass reasoning parser kwargs to structured output (#41199)

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>
    Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

commit c3868bbbe4b160d89adf339bcc069f6956314345
Author: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Date:   Fri May 1 01:08:34 2026 -0400

     [compile] Add FlashInfer FP8 async TP fusion and preserve allreduce fusion ordering #27893   (#39505)

    Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
    Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
    Signed-off-by: roG0d <baonudesifeizhai@gmail.com>

commit 947138b6c22f9a4751b63b9aa75a2bc4b42835e9
Author: sychen52 <41452870+sychen52@users.noreply.github.com>
Date:   Thu Apr 30 21:55:16 2026 -0700

    Add nvfp4 kv cache support (#40177)

    Signed-off-by: Shiyang Chen <shiychen@nvidia.com>

commit 941fb5083552516eee947fc5f6c4d2031af76ea4
Author: Or Ozeri <oro@il.ibm.com>
Date:   Fri May 1 06:59:17 2026 +0300

    [kv_offload+HMA][12/N]: Scheduler-side support for sliding window groups (#41228)

    Signed-off-by: Or Ozeri <oro@il.ibm.com>

commit 6b6ac6c3c737b69e99264731f010588613dada58
Author: Juhi Mittal <39641197+juhi10071998@users.noreply.github.com>
Date:   Thu Apr 30 20:37:43 2026 -0700

    [Kernel][MoE] Support GELU on TRT-LLM NvFP4 fused MoE for Gemma4 (#41050)

    Signed-off-by: Juhi Mittal <juhim@nvidia.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit b542bdf7fb3611879b30917e03bef62963496e83
Author: Stefano Castagnetta <scastagnetta@nvidia.com>
Date:   Fri May 1 05:08:49 2026 +0200

    [Bugfix] Disable FlashInfer CUTLASS MoE on SM110 (Jetson Thor AGX) (#40808)

    Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

commit 415a8798996c518bbc33f377c9652c2776e074a7
Author: Ronen Schaffer <ronen.schaffer@ibm.com>
Date:   Fri May 1 05:18:38 2026 +0300

    [KV Offload] Use `Collection` instead of `Sequence/Iterable` for OffloadingManager key parameters (#41361)

    Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

commit 7198940b395e74337101e7b62437c55a7e1d4284
Author: Dong W <89223086+sniper35@users.noreply.github.com>
Date:   Thu Apr 30 19:06:48 2026 -0700

    [Model] Add Moondream3 model support(only query and caption skills) (#32325)

    Signed-off-by: Dong Wang <dongw2019@gmail.com>

commit 14043dfecd35dd2f12b4d51eb9fa166184a0ca0f
Author: Luis 🚀 <luisfabian1545@gmail.com>
Date:   Thu Apr 30 22:05:55 2026 -0400

    feat: Enable `prompt_embeds` Content Part Support in vLLM Chat Completions API (#40720)

    Signed-off-by: Luis Robaina <luis@protopia.ai>
    Signed-off-by: Luis Robaina 🚀 <luisfabian1545@gmail.com>
    Signed-off-by: LuisRobaina <luis@protopia.ai>
    Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com>

commit 1adaa5056b0e1b7ce9918d30baa5c7b8b7d86e0d
Author: Andreas Karatzas <akaratza@amd.com>
Date:   Thu Apr 30 20:59:35 2026 -0500

    [ROCm][CI] Add ROCm score absolute tolerance floor (#41341)

    Signed-off-by: Andreas Karatzas <akaratza@amd.com>

commit 4d5c89295b763e642129fcf598580ec63dc1d45f
Author: Soyaazz <523420504@qq.com>
Date:   Fri May 1 09:59:26 2026 +0800

    (bugfix): block_size check for flex attn (#41363)

    Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>

commit dd5506a15759abfc28ed1ca2746221c9ab5e1180
Author: Nick Hill <nickhill123@gmail.com>
Date:   Thu Apr 30 18:10:00 2026 -0700

    [Core] Simplify handling of `scheduler_reserve_full_isl` option (#41064)

    Signed-off-by: Nick Hill <nickhill123@gmail.com>

commit a3c83ff2fd050bc6392260d6e84ee1150f238f26
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Thu Apr 30 21:09:55 2026 -0400

    Faster per-token fp8 group quant packed kernel for blackwell (#41326)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
    Co-authored-by: Roger Wang <hey@rogerw.io>

commit 9c61864bf8a911a8369f35d79d538c7f11cf3dc2
Author: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Date:   Thu Apr 30 16:28:57 2026 -0700

    [DeepSeek] Use torch.mm for bf16xbf16->fp32 gemm (#41300)

    Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

commit 71725f6730ec7076ee914ba06fae482a10f2f159
Author: Tran Le <43319264+lequytra@users.noreply.github.com>
Date:   Thu Apr 30 16:19:59 2026 -0700

    [Bugfix] Fix RoutedExpertsCapturer for Gemma 4 MoE (top_k_experts) (#41401)

    Signed-off-by: Tran Le <tranle@fireworks.ai>

commit b4806c8ee12d5c5bbfebd6070b389e2f4daad1fd
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Thu Apr 30 18:33:12 2026 -0400

    [DSV4] Add BF16 and MXFP8 A2A support for flashinfer a2a one sided (#40960)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
    Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
    Co-authored-by: Zijing Liu <liuzijing2014@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 526927be94c63ee677840c9074754fb6fdbdf5a1
Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Date:   Thu Apr 30 18:20:11 2026 -0400

    [Model Runner v2] Fix v2 compile counter `num_gpu_runner_capture_triggers` and `num_cudagraph_captured` (#41285)

    Signed-off-by: yewentao256 <zhyanwentao@126.com>

commit 75a4c166f25f8407591328d7ed92aa2231b0b841
Author: Michael Goin <mgoin64@gmail.com>
Date:   Thu Apr 30 18:02:14 2026 -0400

    Fix typo in log message for indexer cache (#41419)

    Signed-off-by: Michael Goin <mgoin64@gmail.com>

commit 2917d6363ad722bff647168d3a36261254d7ad42
Author: fxmarty-amd <felmarty@amd.com>
Date:   Thu Apr 30 23:35:48 2026 +0200

    [NVFP4][Hopper/AMD Instinct] Add Triton kernels for NVFP4 dequantization and QDQ emulation (#40033)

    Signed-off-by: Felix Marty <Felix.Marty@amd.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit efb4cdf2b8000c850d04706eb6f788903e3ee544
Author: Stefano Castagnetta <scastagnetta@nvidia.com>
Date:   Thu Apr 30 21:47:55 2026 +0200

    [CI/Build] Skip Prithvi/Terratorch model-registry tests when terratorch is missing (#41389)

    Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

commit 92a7c121b62a1484b68c0a27d1ecefd1a84f78fc
Author: Stefano Castagnetta <scastagnetta@nvidia.com>
Date:   Thu Apr 30 21:24:09 2026 +0200

    [CI] Add MTP coverage: Qwen3.5 correctness + no-sync spec decode (#40472)

    Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 307b17ce33165b41c490a099f54f0d0cd12e7f76
Author: Jee Jee Li <pandaleefree@gmail.com>
Date:   Fri May 1 00:57:27 2026 +0800

    [DSV4] Avoid redundant dtype conversion. (#41374)

    Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

commit 3ca6ca210fc2cbf786713f515a6af2fba875884b
Author: wenjun liu <wenjun.liu@intel.com>
Date:   Fri May 1 00:02:23 2026 +0800

    xpu docker: pin oneAPI to 2025.3 and avoid unintended 2026 upgrade (#41380)

    Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
    Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

commit 10558f5f4608f825d06018def1317e7d3a96d6fe
Author: Stefano Castagnetta <scastagnetta@nvidia.com>
Date:   Thu Apr 30 16:59:07 2026 +0200

    [CI/Build] Skip terratorch + torchgeo while PyPI has lightning quarantined (#41377)

    Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>

commit 121dbe7a221d2bf6415caea0acb812c666a7665e
Author: tej <37236721+itej89@users.noreply.github.com>
Date:   Thu Apr 30 09:46:59 2026 -0500

    [ROCm] ROCm DeepEP API updated to latest (#39721)

    Signed-off-by: Tej Kiran <vpolamre@amd.com>
    Signed-off-by: tej <37236721+itej89@users.noreply.github.com>
    Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
    Co-authored-by: HAIAI <39548240+HAIAI@users.noreply.github.com>

commit f03d82efdd88fbd85ddf7a5475e237ae3abaf01e
Author: Matthew Bonanni <mbonanni@redhat.com>
Date:   Thu Apr 30 10:46:54 2026 -0400

    [UX][Bugfix] Fix OOM by setting PyTorch `max_split_size_mb` during model loading (#41268)

    Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

commit a7fb00851030b6991987c559b885dfd8eb039d15
Author: Ilya Markov <markovilya197@gmail.com>
Date:   Thu Apr 30 16:46:49 2026 +0200

    [EPLB] Optimize memory overhead in Nixl communicator (#40013)

    Signed-off-by: ilmarkov <markovilya197@gmail.com>
    Signed-off-by: Markov Ilya <markovilya19@gmail.com>
    Co-authored-by: Markov Ilya <markovilya19@gmail.com>
    Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

commit ff449b6426812d1e5e107715af899fcff5e81419
Author: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Date:   Thu Apr 30 13:48:38 2026 +0100

    Stop mergify labelling from skipping pre-commit (#41362)

    Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

commit 3527229517f01a5f2406fa6fbf35ff9223c65ed5
Author: Stefano Castagnetta <scastagnetta@nvidia.com>
Date:   Thu Apr 30 14:06:44 2026 +0200

    [Doc] Fix RTD build: pytorch.org/docs/stable/objects.inv returns 404 (#41353)

    Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
    Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
    Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

commit b55b26520c088eeed791cbe4648c73ea015f3613
Author: Xiaoshuang Wang <1790571317@qq.com>
Date:   Thu Apr 30 18:31:08 2026 +0800

    [MoE] Make MoERunnerInterface a PluggableLayer for OOT support (#35178)

    Signed-off-by: wxsIcey <1790571317@qq.com>
    Signed-off-by: Icey <1790571317@qq.com>
    Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 3179e53135dbc0bbb9d845fb56e5380a6b88157e
Author: snadampal <87143774+snadampal@users.noreply.github.com>
Date:   Thu Apr 30 03:14:20 2026 -0700

    [P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes (#32553)

    Signed-off-by: Sunita Nadampalli <nadampal@amazon.com>

commit efdc95674db5c7b441d52ae02fa57e57c6bb3855
Author: Nicolò Lucchesi <nlucches@redhat.com>
Date:   Thu Apr 30 11:10:50 2026 +0200

    [KVConnector] MultiConnector SupportsHMA (#39571)

    Signed-off-by: NickLucche <nlucches@redhat.com>

commit 54146a9bf951b8c70ad85fb1a1bee241964209e0
Author: Chenxi Qian <chenxi.qian.cq@outlook.com>
Date:   Thu Apr 30 16:22:41 2026 +0800

    [Bugfix] correct h matrix layout in chunk_kda output kernel (#40956)

    Signed-off-by: ChenxiQian <chenxi.qian.cq@outlook.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit ca97f7b9bbf2e904065dafc6918af3f9f386fdf0
Author: Baekpica <35071468+Baekpica@users.noreply.github.com>
Date:   Thu Apr 30 16:12:42 2026 +0900

    Fix Gemma4 MoE expert weight remapping (#41206)

    Signed-off-by: sunghoon.baek <sunghoon.baek@connectfy.cloud>
    Co-authored-by: sunghoon.baek <sunghoon.baek@connectfy.cloud>
    Co-authored-by: OpenAI Codex <codex@openai.com>

commit a04e0cf3b8cd7bf6b643eacab15033025f462166
Author: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Date:   Thu Apr 30 02:39:04 2026 -0400

    Fix Cohere ASR after HF upgrade (#40582)

    Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>

commit cb1b02d0e8159f25678b96667651ec539546022b
Author: Dhruv Singal <dhruvsingalabc@gmail.com>
Date:   Wed Apr 29 23:19:09 2026 -0700

    [Frontend] Add VLLM_SKIP_MODEL_NAME_VALIDATION environment variable (#34676)

    Signed-off-by: Dhruv Singal <dhruvsingalabc@gmail.com>
    Signed-off-by: Dhruv Singal <dsingal@Dhruvs-MacBook-Pro.local>
    Signed-off-by: Your Name <you@example.com>
    Signed-off-by: vLLM Assistant <assistant@vllm.ai>
    Signed-off-by: Simon Mo <simon.mo@hey.com>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
    Co-authored-by: Dhruv Singal <dsingal@Dhruvs-MacBook-Pro.local>
    Co-authored-by: Your Name <you@example.com>
    Co-authored-by: OpenCode <noreply@openai.com>
    Co-authored-by: Simon Mo <simon.mo@hey.com>

commit a749a33d8d05acdd3ab346bd3f0c6b5c9c80474f
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Thu Apr 30 00:03:45 2026 -0400

    [Bugfix] Fix persistent_topk cooperative deadlock at TopK=1024 (#41189)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit c42981d034713da814913ebd2e53269346f3ecea
Author: Martin Hickey <martin.hickey@ie.ibm.com>
Date:   Thu Apr 30 03:55:31 2026 +0100

    [Refactor][kv_offload] KV Offloading maintainability improvements (#40538)

    Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
    Co-authored-by: Or Ozeri <oro@il.ibm.com>

commit 0ff1bf9bb1ee31ba1f416a4688e705be92643711
Author: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Date:   Wed Apr 29 21:44:07 2026 -0400

    [Bugfix] Fix failure to allocate KV blocks error (#41282)

    Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

commit 0ab67c02225c18ed664864f75c3a38c0659f4be1
Author: Kevin H. Luu <khluu000@gmail.com>
Date:   Wed Apr 29 16:59:16 2026 -0700

    [CI] Add key field to all test_areas pipeline steps (#41201)

    Signed-off-by: khluu <khluu000@gmail.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit 3795d7acf431980e62e738493f437ae2a51549da
Author: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Date:   Wed Apr 29 18:39:01 2026 -0500

    [ROCm][Bugfix][GPTOSS]: fix input_ids and expert_map args for quark w4a8 gptoss (#41165)

    Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

commit 18599bfdf2f9dc117b004491f1eba766934310fc
Author: Nick Hill <nickhill123@gmail.com>
Date:   Wed Apr 29 16:31:00 2026 -0700

    [Ci][BugFix] Fix slow DP tests due to bad teardown logic (#41166)

    Signed-off-by: Nick Hill <nickhill123@gmail.com>

commit 296741d0257107a9d0301409005c85d38bb247bc
Author: Thien Tran <gau.nernst@yahoo.com.sg>
Date:   Thu Apr 30 06:16:40 2026 +0700

    [DSv4] Use `cvt` PTX for FP32->FP4 conversion (#41015)

    Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

commit a966aaed30b9132191e8ba88d4f61e76657b690d
Author: Uranus <109661872+UranusSeven@users.noreply.github.com>
Date:   Thu Apr 30 07:14:50 2026 +0800

    [Bugfix][MLA] Size arange_buffer to max_num_batched_tokens to prevent CUDA IMA (#39277)

    Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
    Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
    Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>

commit 6841f5dc77e9200a2fa45a4bf935b23bd843bf30
Author: Hemanth Acharya <heachary@amd.com>
Date:   Thu Apr 30 04:37:46 2026 +0530

    [ROCm] Add env flags to disable dynamic MXFP4 quant and enable AITER tuned GEMMs for Attention Projection Layers (#39987)

    Signed-off-by: Hemanth Acharya <heachary@amd.com>

commit c2fb013312e107c6809b1bf5cc4f22e499e1b81d
Author: roikoren755 <26850796+roikoren755@users.noreply.github.com>
Date:   Thu Apr 30 00:59:18 2026 +0300

    [Bugfix][Compile] Fix gc.collect/empty_cache patch arity in CUDAGraphWrapper (#41235)

    Signed-off-by: Roi Koren <roik@nvidia.com>

commit ccfb620c62533c0dbfa8d5a0307fab9682b7c29f
Author: Rishi Puri <riship@nvidia.com>
Date:   Wed Apr 29 18:56:56 2026 -0300

    Create tests/distributed/test_mnnvl_alltoall.py (#35241)

    Signed-off-by: Rishi Puri <riship@nvidia.com>
    Signed-off-by: Claude <claude@anthropic.com>
    Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
    Co-authored-by: Claude <claude@anthropic.com>
    Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>

commit 0335316a9ba245e5e82a20ef1b53ba3da108afd5
Author: Aaron Hao <ahao@anyscale.com>
Date:   Wed Apr 29 14:51:03 2026 -0700

    [BUG] Two phase pause to prevent deadlock (#39366)

    Signed-off-by: ahao-anyscale <ahao@anyscale.com>
    Signed-off-by: Aaron Hao <ahao@anyscale.com>
    Co-authored-by: Junjie Zhang <junj.jay.zhang@gmail.com>
    Co-authored-by: Nick Hill <nickhill123@gmail.com>

commit 944e138bcf39e9236bbfd49d98f00fb45e6cea54
Author: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Date:   Wed Apr 29 16:39:03 2026 -0500

    [ROCm][Bugfix]: W4A4 MOE using emulation instead of AITER on MXFP4-supported hardware (#41175)

    Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

commit b58669cb427effa49928b7be5b6e0f4fd707bce5
Author: Luochao Wang <wangluochao902@gmail.com>
Date:   Wed Apr 29 14:20:13 2026 -0700

    [Perf][Spec Decode] Avoid per-step numpy allocation in prepare_next_t… (#41043)

    Signed-off-by: wangluochao902 <wangluochao902@gmail.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 1628239eb234739646e21c4053e3fa652058e245
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Thu Apr 30 05:16:19 2026 +0800

    [Multimodal][Render] Skip mm processor initialization and warmup for text-only mode (#41246)

    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

commit 93da1fe97abf71ac81e7daea21547292f9b39aa4
Author: yzong-rh <yzong@redhat.com>
Date:   Wed Apr 29 17:01:57 2026 -0400

    [CI] Add temperature to bfcl eval, default greedy (#41059)

    Signed-off-by: Yifan Zong <yzong@redhat.com>

commit 169988a3c0e0912fc20be2d104a4b76a51ad9fa4
Author: Andrew Barnes <bortstheboat@gmail.com>
Date:   Wed Apr 29 16:46:01 2026 -0400

    [ROCm] Use quant_dtype in per_token_quant instead of hardcoded FP8 (#39121)

    Signed-off-by: Bortlesboat <bortstheboat@gmail.com>

commit faab18955407f256c7ced2d227ce097f472db16d
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Thu Apr 30 03:15:35 2026 +0800

    [Feature]: IndexCache support for DSA models (#37735)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 6f20f81cbf1d12dc9d499d25ea0a64ef4c816c00
Author: Laith Sakka <lsakka@meta.com>
Date:   Wed Apr 29 11:32:15 2026 -0700

    Replace shape_invariants with simpler apprach in dynamic_arg_dims utilizing shape_id property.  (#36194)

    Signed-off-by: Laith Sakka <lsakka@meta.com>

commit d1a75e303d81eaaa3d0bb5622e0a6d380ccc22fa
Author: danisereb <daserebrenik@nvidia.com>
Date:   Wed Apr 29 20:39:49 2026 +0300

    Fix timeout when using LoRA adapters with Nemotron Super (#40916)

    Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

commit 4a42aba380bf9cac47009a7307a9d91dd2222d84
Author: Cyrus Leung <tlleungac@connect.ust.hk>
Date:   Thu Apr 30 00:48:52 2026 +0800

    [CI/Build] Enable FP8 on NVIDIA Thor (#39712)

    Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

commit a80d6f150c39e9e7121a54d293aa1d09619118c2
Author: Avshalom Manevich <avshalom.manevich@hcompany.ai>
Date:   Wed Apr 29 18:48:47 2026 +0200

    better logging for large uncachable items (#41145)

    Signed-off-by: h-avsha <avshalom.manevich@hcompany.ai>

commit 91a2d3901416fcff11e192f32683ca963726989b
Author: Terrence Zhao <32208165+Terrencezzj@users.noreply.github.com>
Date:   Wed Apr 29 11:54:54 2026 -0400

    [Models] Cohere MoE (#40817)

    Signed-off-by: Terrencezzj <terrence@cohere.ai>

commit a05848e255614401e3813c656b8cfa94969952d4
Author: Frederik Gossen <frgossen@meta.com>
Date:   Wed Apr 29 11:32:03 2026 -0400

    [Bugfix] Report compile time for in-memory cache hit path (#41023)

    Signed-off-by: Frederik Gossen <frgossen@meta.com>

commit 51fda1ba44ff3fd08e9202ce4f404cf3a1feaec1
Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Date:   Wed Apr 29 11:30:33 2026 -0400

    [Model Runner v2] Fix block table IMA issue (#40648)

    Signed-off-by: yewentao256 <zhyanwentao@126.com>
    Signed-off-by: Nick Hill <nickhill123@gmail.com>
    Co-authored-by: Nick Hill <nickhill123@gmail.com>

commit 39a7f4f4e2635012ead0ad127970d7b6778890af
Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Date:   Wed Apr 29 11:11:04 2026 -0400

    [Perf] Optimize `AllPool.forward` by slicing first, 51% faster in the method level benchmark (#41163)

    Signed-off-by: yewentao256 <zhyanwentao@126.com>

commit b92ef9ec5a041b538f44d9584bef0e34bfbeecd1
Author: Artem Perevedentsev <aperevedents@nvidia.com>
Date:   Wed Apr 29 18:10:34 2026 +0300

    [Perf] Enable FlashInfer top-k/top-p sampler by default (#40376)

    Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

commit 5560cac7e25b1a2c3c15506c885af4911c5611d9
Author: Lalithnarayan C <Lalithnarayan.C@amd.com>
Date:   Wed Apr 29 19:51:55 2026 +0530

    [Bugfix][CPU] Backport PT cpp codegen indirect_assert scalar-mask fix (#40973)

    Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 5b39b268f506150dbab38f6f6c04b7c843e37c07
Author: pmaybank <113125070+pmaybank@users.noreply.github.com>
Date:   Wed Apr 29 13:57:58 2026 +0100

    hf_name argument for vllm bench throughput CLI (#41012)

    Signed-off-by: Philip Maybank <pmaybank@amd.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit 22524f7a92b71c8e65eade20ef274fa3b4006d3e
Author: Tianmu Li <tianmu.li@intel.com>
Date:   Wed Apr 29 05:43:21 2026 -0700

    [Feat] CPU fp8 attn for AMX/AVX-512 (#39445)

    Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
    Co-authored-by: Claude <noreply@anthropic.com>
    Co-authored-by: Li, Jiang <jiang1.li@intel.com>

commit 9d8ad5b408bf447e41a3629fc21a453720aaf52b
Author: Jee Jee Li <pandaleefree@gmail.com>
Date:   Wed Apr 29 20:29:55 2026 +0800

    [Bugfix] Fix repeated DSv4 RoPE cache initialization (#41148)

    Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
    Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

commit 11b69129e2221b64302fb672552c0bc04dddece5
Author: Jared Wen <w13431838023@gmail.com>
Date:   Wed Apr 29 19:35:50 2026 +0800

    [Frontend] Add `defer_loading` and `tool_reference` support for Anthropic and OpenAI APIs  (#40190)

    Signed-off-by: JaredforReal <w13431838023@gmail.com>
    Signed-off-by: sfeng33 <4florafeng@gmail.com>
    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
    Co-authored-by: sfeng33 <4florafeng@gmail.com>
    Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

commit 33f36d42605a476a09ed75936e7c931cb8b432c5
Author: Bugen Zhao <i@bugenzhao.com>
Date:   Wed Apr 29 19:03:47 2026 +0800

    [DSV4] Support `max` reasoning effort (#40982)

    Signed-off-by: Bugen Zhao <i@bugenzhao.com>

commit 37e288214bc3fa89d974b4d323373f2b2878d604
Author: Ronen Schaffer <ronen.schaffer@ibm.com>
Date:   Wed Apr 29 13:50:42 2026 +0300

    [KV Offload] Tighten `keys` type from `Iterable` to `Sequence` in `OffloadingManager` (#41200)

    Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

commit 5371d6fb4023a1a08021135e46e9354ba0923e50
Author: Rohit Kumar Singh <9626333+SKRohit@users.noreply.github.com>
Date:   Wed Apr 29 15:47:51 2026 +0530

    Fix PP in Gemma4 (#40786)

    Signed-off-by: Rohit kumar Singh <rksingh@habana.ai>
    Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

commit 6d7d4da99e41c4ccc0d52d74e2bf36d1ff31034d
Author: Jiangyun Zhu <riverclouds.zhu@qq.com>
Date:   Wed Apr 29 18:08:55 2026 +0800

    [Bugfix] BailingMoeV2.5: rotate full qk_rope_head_dim in MLA RoPE (#41185)

    Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

commit 3f1a4bb639a9b65e2634a6529c90da36944d6472
Author: Alec <35311602+alec-flowers@users.noreply.github.com>
Date:   Wed Apr 29 03:07:41 2026 -0700

    build: embed image provenance metadata in vLLM containers (#40653)

    Signed-off-by: Alec Flowers <aflowers@nvidia.com>
    Co-authored-by: OpenAI Codex <codex@openai.com>

commit 762022cafb1afc4c51ce706c043e2f1f5826003a
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Wed Apr 29 17:55:07 2026 +0800

    [Bugfix] DSV32/V4 add missing type conversion for non-streaming tool calls (#41198)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

commit 3885d340a4779c54662b10892555ae6928b3a090
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Wed Apr 29 17:11:27 2026 +0800

    [Frontend]Responses API supports Tool/Function calling with streaming with named tool/function (#41110)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

commit ef70057ca76688fc786c7fdee926ce2bd129b2c0
Author: haosdent <haosdent@gmail.com>
Date:   Wed Apr 29 16:28:45 2026 +0800

    [CI][CPU] Split CPU-Distributed Tests into per-scenario labels (#41203)

    Signed-off-by: haosdent <haosdent@gmail.com>

commit e48cb85185d792f5b4a595c2af3cbc37ac742aac
Author: Shengqi Chen <harry-chen@outlook.com>
Date:   Wed Apr 29 15:37:14 2026 +0800

    [CI/Build] Auto-detect manylinux ABI tag for nightly wheels (#41149)

    Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
    Co-authored-by: Claude <noreply@anthropic.com>

commit 92879e12ba130e12bcc2728939eba86b2644122f
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Wed Apr 29 15:32:37 2026 +0800

    [CI] fix test_rotary_embedding_opcheck format error (#41202)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

commit 68dd7db81001267c846907769adc14bb32566190
Author: rishitdholakia13 <123388671+rishitdholakia13@users.noreply.github.com>
Date:   Wed Apr 29 02:14:52 2026 -0400

    [Reasoning] Support for speculative decoding with thinking budget (#34668)

    Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
    Signed-off-by: rishitdholakia13 <123388671+rishitdholakia13@users.noreply.github.com>
    Co-authored-by: Nick Hill <nickhill123@gmail.com>
    Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

commit 8a8c9b564ef015c76cf398200b8f0891e6e51bb8
Author: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Date:   Wed Apr 29 08:52:55 2026 +0300

    [KV Offload] Per-job store completion for CPU offloading connector (#39186)

    Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
    Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
    Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
    Co-authored-by: Or Ozeri <or@ozery.com>
    Co-authored-by: Or Ozeri <oro@il.ibm.com>

commit a269744e9f733ec9bac4bb6a33f70cc5af38afc3
Author: Jee Jee Li <pandaleefree@gmail.com>
Date:   Wed Apr 29 13:42:35 2026 +0800

    [Bugfix] Fix rope  (#41113)

    Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

commit 8b49cf3a37eb1a267a06b0df23328909330af1e6
Author: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Date:   Wed Apr 29 00:33:06 2026 -0400

    [Bugfix] Fix max_num_batched_token not captured in cuda graph  (#40734)

    Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
    Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
    Co-authored-by: Wei Zhao (Engrg-Hardware 1) <weizha@login-bia02.bia.clusters.nvidia.com>

commit 2ae73c758ceed55ad2f70a69b47c8a994fce5662
Author: Jiangyun Zhu <riverclouds.zhu@qq.com>
Date:   Wed Apr 29 12:18:46 2026 +0800

    [Bugfix] fix inductor error for dpsk v4 (#41135)

    Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

commit d95d03c719cd69e634e567d6cad3228557151393
Author: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
Date:   Wed Apr 29 05:08:35 2026 +0100

    [BugFix][CPU] fix error on CPU runner shutdown (#41034)

    Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

commit 803b9d7881cd3a8482aaa1e6bf990193b55c6331
Author: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Date:   Wed Apr 29 00:08:16 2026 -0400

    [Bugfix] Fix Deepseek V4 import error due to AOT compile cache loading (#41090)

    Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
    Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>

commit 1312f0753115cb36410334e3667961d1237a287b
Author: Walter Beller-Morales <walterbm@users.noreply.github.com>
Date:   Wed Apr 29 00:07:53 2026 -0400

    [Feature] add cohere reasoning and tool parsers (#40422)

    Signed-off-by: walterbm <walter.beller.morales@gmail.com>

commit fa1b9840f6d87ef6e3b247a78514ccc1d6e5f1ce
Author: Lucas Kabela <lucasakabela@gmail.com>
Date:   Tue Apr 28 21:07:24 2026 -0700

    [BE][Torch 2.12] Remove workaround code for fixed cublas issue (#40845)

    Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
    Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>
    Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

commit 916e56c05c997155b865dd4f46172f26e755da3d
Author: Kyle Sayers <kylesayrs@gmail.com>
Date:   Wed Apr 29 00:06:54 2026 -0400

    [QeRL] Add warnings for extra memory buffering  (#40309)

    Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
    Co-authored-by: Flora Feng <4florafeng@gmail.com>

commit a085b5257dd8cc8d6c255e9b92e4642ee12fc3aa
Author: Kyle Sayers <kylesayrs@gmail.com>
Date:   Wed Apr 29 00:06:38 2026 -0400

    [Docs] [QeRL] Layerwise Reloading Documentation (#40317)

    Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
    Co-authored-by: Flora Feng <4florafeng@gmail.com>

commit 7fd05e05aeb3664ca19346771dc559d93423acd4
Author: liangel-02 <liangel@meta.com>
Date:   Wed Apr 29 00:05:14 2026 -0400

    uncomment flex backend for batch invariant mode (#40842)

    Signed-off-by: Angel Li <liangel@meta.com>

commit 99255f3cb5cec7466bf9fa5310fd310baf87d711
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Wed Apr 29 12:04:49 2026 +0800

    [UX] Allow enable/disable model weights loading tracking by config (#41086)

    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
    Co-authored-by: Copilot <copilot@github.com>

commit 75a7cf2c10f2dcc484c3e0444af33e0eaf3f4311
Author: haosdent <haosdent@gmail.com>
Date:   Wed Apr 29 11:23:59 2026 +0800

    [CI] De-flake test_chat_completion_n_parameter_non_streaming (#41147)

    Signed-off-by: haosdent <haosdent@gmail.com>

commit 4b95e9cec4e9a1a90d3f4b2afa62e88459b2b90e
Author: haosdent <haosdent@gmail.com>
Date:   Wed Apr 29 10:23:26 2026 +0800

    [CI] Return HTTP 400 for unsupported chat content part type (#41121)

    Signed-off-by: haosdent <haosdent@gmail.com>

commit 856b15c62c8a574a1a0a289444d5b9a8120433e3
Author: rasmith <Randall.Smith@amd.com>
Date:   Tue Apr 28 21:12:17 2026 -0500

    [CI][AMD][BugFix] Patch has_flashinfer decorator for test_select_rocm_aiter_backend  (#41072)

    Signed-off-by: Randall Smith <Randall.Smith@amd.com>

commit 6fb3f7b46b12ea63265afbe6d53d6f15a5de7b3a
Author: qizixi <22851944+zixi-qi@users.noreply.github.com>
Date:   Tue Apr 28 17:22:03 2026 -0700

    [DSV4] Align aux stream API with DeepseekV4DecoderLayer (#41171)

    Signed-off-by: zixi-qi <zixi@inferact.ai>

commit d109eacd05f774008c7e1d17afc76fc48c4fcbc5
Author: chelnnexy <86009079+chelnnexy@users.noreply.github.com>
Date:   Tue Apr 28 19:04:53 2026 -0500

    [Bugfix][ROCm] Fix gemm_a4w4 call to use updated AITER API signature (#40754)

    Signed-off-by: cheiluno <cheiluno@amd.com>

commit e68fa1b90a7bc52510c11fe2edeae11db15f98fc
Author: Nick Hill <nickhill123@gmail.com>
Date:   Tue Apr 28 15:44:09 2026 -0700

    [Core] Account for `num_gpu_blocks_override` in `max_model_len` checks (#41069)

    Signed-off-by: Nick Hill <nickhill123@gmail.com>

commit f05f3664c35804bf2b5b64eecd17ddfdbb8ed5e3
Author: Russell Bryant <rbryant@redhat.com>
Date:   Tue Apr 28 17:53:19 2026 -0400

    [Doc] Add missing API endpoints to security documentation (#40532)

    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit e9f8f31e9a4c31d6842ca1adffe2619ed204fafb
Author: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Date:   Tue Apr 28 21:22:20 2026 +0200

    [FEATURE] Add EagleMistralForCausalLM (#41024)

    Signed-off-by: juliendenize <julien.denize@mistral.ai>

commit de3fe8dc62f3d77eb8dab8125ca90436f606bccb
Author: yangrz <37785043+yangrz7@users.noreply.github.com>
Date:   Wed Apr 29 02:38:43 2026 +0800

    [Bugfix] release KV blocks for skipped P-ranks to prevent invalid KV errors and timeouts when P_tp > D_tp and MLA (#40449)

    Signed-off-by: yangruize <yangruize7@163.com>
    Co-authored-by: Roger Wang <hey@rogerw.io>

commit 0899f436aab42f798fb8e728872334c83aaebb79
Author: Joe Rowell <joerowell4@gmail.com>
Date:   Tue Apr 28 20:23:00 2026 +0200

    [New Model] Laguna XS.2 implementation (#41129)

    Signed-off-by: Joe Rowell <joerowell4@gmail.com>
    Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
    Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>

commit 358a755e43b07b9454904df9d3c3fae3340058f1
Author: rasmith <Randall.Smith@amd.com>
Date:   Tue Apr 28 13:14:59 2026 -0500

    [CI][AMD][BugFix] Update request URL in test_moriio_connector to match vllm-router compatibility changes (#41076)

    Signed-off-by: Randall Smith <Randall.Smith@amd.com>

commit a60883644be0bcf5219b792b5abbc448e4ea0dcf
Author: Benoit Tigeot <benoittgt@users.noreply.github.com>
Date:   Tue Apr 28 19:27:18 2026 +0200

    [Build] Defer flashinfer cubin download to avoid ~2.5 GB (decompressed) layer duplication (#41134)

    Signed-off-by: Benoit Tigeot <benoit.tigeot@lifen.fr>

commit 5aa371dc8e38e053754d89b444abca0a1d63f676
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Tue Apr 28 12:08:55 2026 -0400

    [DSV4] Enable Multi-stream for Pre-Attn GEMM (#41061)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

commit de3da0b97cd9db8b1d429312992a5759c89ef881
Author: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>
Date:   Tue Apr 28 18:38:48 2026 +0800

    Add tuned triton fused_moe configs on H100 for gpt-oss (#39904)

    Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>

commit 9e92de51c61a47e5abb32d99b1930862473741d5
Author: Roy Wang <jasonailu87@gmail.com>
Date:   Tue Apr 28 15:52:54 2026 +0800

    [Bugfix] Exclude numa_bind fields from ParallelConfig DP hash (#41098)

    Signed-off-by: yasong <yasong.wang@inferact.ai>

commit bde0efdbb78a57dc10375e8d0686cf862332192c
Author: artem-spector <artem_spector@yahoo.com>
Date:   Tue Apr 28 10:43:30 2026 +0300

    [Bugfix][Granite4Vision] Fix deepstack buffer causing decode slowdown in compiled mode (#40917)

    Signed-off-by: artemspector <artems@il.ibm.com>
    Co-authored-by: artemspector <artems@il.ibm.com>

commit ea74f701db6c0dd4b2d954f5e79841101d0d8a5d
Author: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
Date:   Tue Apr 28 15:33:49 2026 +0800

    Bugfix: fix SpecBench sample argument error (#40927)

    Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

commit a8208e6a81befd781b2a9a8b6b29fd61f5333c66
Author: wang.yuqi <yuqi.wang@daocloud.io>
Date:   Tue Apr 28 15:33:41 2026 +0800

    [Examples] Resettle features examples. (#40995)

    Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

commit 76c9cccc368fde7f1b8d8a546e3638f4f434c8fd
Author: anthonsu <50185138+anthonsu@users.noreply.github.com>
Date:   Mon Apr 27 23:42:47 2026 -0700

    [Core] Fix redundant None append in StepPool.forward for chunked prefill (#41049)

    Signed-off-by: Anthony Su <xsuanthony@gmail.com>

commit ed57f771923703998a17ad656536ffb460447a2c
Author: JiangWeixiang <854746559@qq.com>
Date:   Tue Apr 28 13:39:23 2026 +0800

    [Bugfix ] fix bailing_moe_linear (#40859)

    Signed-off-by: ghphotoframe <854746559@qq.com>

commit 7a1eb8ac2ec4ea69338c51dc7afd4b15010abfa8
Author: Jiangyun Zhu <riverclouds.zhu@qq.com>
Date:   Tue Apr 28 12:52:54 2026 +0800

    [Model] update for mimo v25 (#41029)

    Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
    Signed-off-by: Isotr0py <Isotr0py@outlook.com>
    Co-authored-by: Isotr0py <Isotr0py@outlook.com>
    Co-authored-by: Copilot <copilot@github.com>

commit c2e88a281c53059d023a2aee43217a7379509a4a
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Tue Apr 28 12:43:04 2026 +0800

    [Bugfix] Fix broken example opeanai client (#41088)

    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

commit fd74c90d9c3b5c35308f1f0ab308469235fa5277
Author: Matthew Bonanni <mbonanni@redhat.com>
Date:   Mon Apr 27 22:38:09 2026 -0400

    [Attention][Spec Decode] Allow independent drafter attention backend selection (#39930)

    Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

commit 146f44b77d079f5a16fe7094fa0dde6b1be95f38
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Tue Apr 28 10:36:58 2026 +0800

    [Frontend]Responses API supports Tool/Function calling with streaming with required (#40700)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

commit 0d4f71420822a6f9eb386fe1f3f690ecbf31153b
Author: yzong-rh <yzong@redhat.com>
Date:   Mon Apr 27 22:36:54 2026 -0400

    [Bugfix] Remove tokenizer encode/decode calls from Olmo3 reasoning parser (#40855)

    Signed-off-by: Yifan <yzong@redhat.com>
    Co-authored-by: Flora Feng <4florafeng@gmail.com>

commit 03aeed802f374c0319ad9eca34fae8e7e784769a
Author: Angela Yi <angelayi@meta.com>
Date:   Mon Apr 27 17:51:15 2026 -0700

    [Test] Fix test_dynamic_shapes_compilation for torch 2.12 (#40743)

    Signed-off-by: Angela Yi <angelayi@meta.com>

commit 2c8b76c5cb2683f05650f20d90a63f3d9799e909
Author: Jee Jee Li <pandaleefree@gmail.com>
Date:   Tue Apr 28 08:16:55 2026 +0800

    [Model][DSV4] Support base model (#41006)

    Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

commit 407b34be263320f843c0251af64a8521760871ea
Author: Kunshang Ji <kunshang.ji@intel.com>
Date:   Tue Apr 28 08:04:54 2026 +0800

    [xpu] bump up vllm-xpu-kernel v0.1.7 (#41019)

    Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

commit 4c7c69b4e0aa7062b8a48268abb06c041bcec53d
Author: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Date:   Mon Apr 27 15:38:05 2026 -0700

    [Model Runner V2] Skip attention metadata rebuild before draft prefill (#40410)

    Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

commit 5e2c37facde9f5edd68a7de8293107089e9887d8
Author: Andreas Karatzas <akaratza@amd.com>
Date:   Mon Apr 27 15:08:57 2026 -0500

    [ROCm][CI] Add missing quantization methods and fix online quant test failures (#39801)

    Signed-off-by: Andreas Karatzas <akaratza@amd.com>

commit c8bbe05189babd69312876c1dcdc80912207e154
Author: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Date:   Mon Apr 27 14:16:22 2026 -0400

    [Perf] Update TRTLLM supported MoE routing methods (#39141)

    Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
    Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
    Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
    Co-authored-by: root <root@bia0030.bia.clusters.nvidia.com>
    Co-authored-by: root <root@bia0036.bia.clusters.nvidia.com>

commit 6232fb4b66b42c8e5f4ef1cc4c5163442cc99208
Author: Zhewen Li <zhewenli@meta.com>
Date:   Mon Apr 27 10:58:06 2026 -0700

    [Docker] Install numactl CLI in CUDA runtime image (#41032)

    Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
    Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
    Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

commit 2c06cf3486a67efcbdf265b8a183f9ed836cebb7
Author: Moritz Sanft <58110325+msanft@users.noreply.github.com>
Date:   Mon Apr 27 17:22:35 2026 +0200

    [Bugfix] use `served_model_name` for multimodal error message (#41003)

    Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

commit e6f710a87f3ce8b137d15ffa4b3a12568e1c8aa3
Author: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Date:   Mon Apr 27 16:19:57 2026 +0100

    Deprecate support for Transformers v4 (#40389)

    Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit c245d35ff467bb3e9a73fcb3c4b02e6c7a3d2964
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Mon Apr 27 21:26:51 2026 +0800

    [Model] Add MiMo-V2.5 support (#40967)

    Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
    Signed-off-by: Isotr0py <Isotr0py@outlook.com>
    Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
    Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
    Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
    Co-authored-by: zjy0516 <zhujiangyun@inferact.ai>
    Co-authored-by: yasong <yasong.wang@inferact.ai>
    Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
    Co-authored-by: Copilot <copilot@github.com>

commit f8ac0c7cf0e3d4ac8894346005bdffe3bd7bd378
Author: Xiaoshuang Wang <1790571317@qq.com>
Date:   Mon Apr 27 20:57:13 2026 +0800

    [Bugfix] Fix k_norm weight sharding in MiniMaxM2Attention when total_num_kv_heads < tp_size (#38191)

    Signed-off-by: wxsIcey <1790571317@qq.com>
    Signed-off-by: Icey <1790571317@qq.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit ebf862c351dc4bcaf65de34c3caebe6df6e9e214
Author: Simon Mo <simon.mo@hey.com>
Date:   Mon Apr 27 01:17:52 2026 -0700

    Add system_fingerprint field to OpenAI-compatible API responses (#40537)

    Co-authored-by: Claude <noreply@anthropic.com>

commit 8d8062d0a7014b4cde064024ae5d5a8715a833b3
Author: wang.yuqi <yuqi.wang@daocloud.io>
Date:   Mon Apr 27 15:48:37 2026 +0800

    [Examples] Resettle generate examples. (#36464)

    Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

commit 985961345a13f3e3bb15d29c94b011ba9a6b858b
Author: Roy Wang <jasonailu87@gmail.com>
Date:   Mon Apr 27 15:47:39 2026 +0800

    [Bugfix] Install libcublas-dev in Dockerfile for FlashInfer CuTe DSL JIT (#39855)

    Signed-off-by: esmeetu <jasonailu87@gmail.com>
    Co-authored-by: Roger Wang <hey@rogerw.io>

commit 706a04d34ba64ea23d430d5e50038791aacfae96
Author: Yongye Zhu <zyy1102000@gmail.com>
Date:   Mon Apr 27 03:37:43 2026 -0400

    [DSV4] Add silu clamp limit to shared expert (#40950)

    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

commit 22631f80a01a04b06398952e77d7890ab660ab10
Author: Isotr0py <mozf@mail2.sysu.edu.cn>
Date:   Mon Apr 27 15:27:06 2026 +0800

    [Bugfix] Remove invalid deepstack boundary check for Qwen3-VL (#40932)

    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

commit 2cc008e7b491bb77f0caf4d27ad55a83f196114c
Author: Bhoomit <bhoomit.2010@gmail.com>
Date:   Sun Apr 26 22:48:36 2026 -0700

    [Attention][TurboQuant] Share dequant buffers, eliminate float16_copy (#40941)

    Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
    Signed-off-by: Vasani Bhoomit <bhoomit.2010@gmail.com>

commit 5d5c7764446da0d4888add9a060604e376e4e856
Author: Zhanda Zhu <49645678+zhandaz@users.noreply.github.com>
Date:   Mon Apr 27 06:44:15 2026 +0100

    [Perf] FP8 FlashInfer Attn for ViT (#38065)

    Signed-off-by: Zhanda Zhu <zhandazhu@gmail.com>
    Co-authored-by: Yubo Gao <ybgao-nvidia@users.noreply.github.com>

commit 592ae6805cb87c9a44c29bd9c6eef9d04d91e39b
Author: ojhaanshika <anshikao@nvidia.com>
Date:   Sun Apr 26 22:15:29 2026 -0700

    Cutlass W4A16 (Machete) Tests (#35450)

    Signed-off-by: Anshika Ojha <anshikao@nvidia.com>

commit 7b1bc0a3eb01a6bc2650eda9970049f7825240d7
Author: Dao007forever <dao007forever@gmail.com>
Date:   Sun Apr 26 21:33:13 2026 -0700

    [Bugfix] Cap SWA/chunked-local runtime admission to startup pool-sizing bound (#40946)

    Signed-off-by: Dao Le <Dao007forever@gmail.com>
    Signed-off-by: Nick Hill <nickhill123@gmail.com>
    Co-authored-by: Claude <noreply@anthropic.com>
    Co-authored-by: Nick Hill <nickhill123@gmail.com>

commit c0879d94839a4cc0febba20cc1cb5642fc5c9cc4
Author: Silu Panda <31051721+SiluPanda@users.noreply.github.com>
Date:   Sun Apr 26 19:26:51 2026 -0700

    [Tests] Gate Isaac under Transformers v5 (#40907)

    Signed-off-by: Silu Panda <31051721+SiluPanda@users.noreply.github.com>

commit f5f987851493e6f09ab2ddeb3f33ae878eda0353
Author: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Date:   Sun Apr 26 19:12:08 2026 -0700

    [Model Runner V2] Fix rejection sampling acceptance rate gap vs MRV1 (#40651)

    Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>

commit 2ce95a761b9acd925d1bc69cfd1d4fc13de9e2b7
Author: youkaichao <youkaichao@gmail.com>
Date:   Mon Apr 27 09:37:22 2026 +0800

    Auto-disable expandable_segments around cumem memory pool (#40812)

    Signed-off-by: youkaichao <youkaichao@gmail.com>
    Co-authored-by: Claude <noreply@anthropic.com>
    Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

commit 4d51588e2381018348f1022dfa3a7698899805b7
Author: Yifan Qiao <yifanqiao@inferact.ai>
Date:   Sun Apr 26 18:31:08 2026 -0700

    [Feat] DeepSeek V4 Rebased  (#40860)

    Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
    Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
    Signed-off-by: qizixi <zixi@inferact.ai>
    Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
    Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
    Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
    Co-authored-by: Yongye Zhu <yongye@inferact.ai>
    Co-authored-by: Simon Mo <simon@inferact.ai>
    Co-authored-by: Bugen Zhao <i@bugenzhao.com>
    Co-authored-by: Giancarlo Delfin <gdelfin@inferact.ai>
    Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
    Co-authored-by: Nick Hill <nickhill123@gmail.com>
    Co-authored-by: Roger Wang <hey@rogerw.io>
    Co-authored-by: Roy Wang <yasong.wang@inferact.ai>
    Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
    Co-authored-by: youkaichao <youkaichao@gmail.com>
    Co-authored-by: Zhewen Li <jerven.vllm@gmail.com>
    Co-authored-by: Zijing Liu <liuzijing2014@gmail.com>
    Co-authored-by: khluu <khluu000@gmail.com>
    Co-authored-by: qizixi <zixi@inferact.ai>
    Co-authored-by: Zhewen Li <zhewenli@inferact.ai>

commit 32e45636e3d7e02615facc8c63645ce4ac1d7e11
Author: Xinan Miao <1403572259@qq.com>
Date:   Mon Apr 27 01:44:42 2026 +0800

    [torch.compile]: Disable Sequence Parallelism (SP) for piecewise compilation (#38373)

    Signed-off-by: SouthWest7 <am1ao@qq.com>
    Signed-off-by: Xinan Miao <1403572259@qq.com>
    Co-authored-by: SouthWest7 <am1ao@qq.com>
    Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
    Co-authored-by: OpenAI Codex <codex@openai.com>
    Co-authored-by: Wang Xingran <72983099+wangxingran222@users.noreply.github.com>
    Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

commit b39c266dae8cd7aee31f667c973e9698ed0b2361
Author: omerpaz95 <73347585+omerpaz95@users.noreply.github.com>
Date:   Sun Apr 26 15:06:01 2026 +0300

    [KV Offload] Offload all KV blocks when doing prefill in P/D (#40346)

    Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
    Signed-off-by: omerpaz95 <73347585+omerpaz95@users.noreply.github.com>
    Co-authored-by: Or Ozeri <or@ozery.com>

commit 9558f43903faa1b6db08ac98802bf88111196345
Author: Dao007forever <dao007forever@gmail.com>
Date:   Sun Apr 26 01:26:34 2026 -0700

    [Bugfix] Size FlashInfer NVLink MNNVL workspace to EP group (#40893)

    Signed-off-by: Dao Le <Dao007forever@gmail.com>

commit 8cd174fa358326d5cc4195446be2ebcd65c481ce
Author: Jee Jee Li <pandaleefree@gmail.com>
Date:   Sun Apr 26 09:55:19 2026 +0800

    [LoRA] MoE LoRA Refactor (#40338)

commit c798593f0d88cec583c599ea7ea40a2cc26c312b
Author: Chauncey <chaunceyjiang@gmail.com>
Date:   Sun Apr 26 08:58:50 2026 +0800

    [Bugfix] Fix the DSML token leakage in DSV4/3.2 (#40806)

    Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
    Signed-off-by: sfeng33 <4florafeng@gmail.com>
    Co-authored-by: sfeng33 <4florafeng@gmail.com>
    Co-authored-by: Windswithyou 1694599440@qq.com

commit 12a3f6454b973d7cd8806d398ba287a7e1d22c63
Author: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Date:   Sat Apr 25 23:50:12 2026 +0300

    [Bugfix][MoE] Only unpad routed output before shared expert add or routed output transform (#40865)

    Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
    Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

commit 60cd878a3beca91e63d9a34a9c60fd335e780182
Author: Or Ozeri <oro@il.ibm.com>
Date:   Sat Apr 25 20:00:46 2026 +0300

    [kv_offload+HMA][11/N]: Support store with multiple KV groups (#39403)

    Signed-off-by: Or Ozeri <oro@il.ibm.com>

commit 1e9f19ca3fd29f83442ab83b08d4642e691c95bd
Author: rasmith <Randall.Smith@amd.com>
Date:   Sat Apr 25 08:34:14 2026 -0500

    [CI][AMD]BugFix] Fix deadlock occuring in test_moe_layer (#40767)

    Signed-off-by: Randall Smith <Randall.Smith@amd.com>

commit 6646c0c7e0c921709c9b194e3988dfaabda5ee15
Author: labAxiaoming <34019940+labAxiaoming@users.noreply.github.com>
Date:   Sat Apr 25 21:04:26 2026 +0800

    [Opt] Optimize deepstack buffer handling for multimodal Qwen3 models (#40145)

    Signed-off-by: xiaoming <1259730330@qq.com>

commit 95995bbef81292e3ee1ef0df5ca3989bb481bdd5
Author: Andreas Karatzas <akaratza@amd.com>
Date:   Sat Apr 25 00:25:20 2026 -0500

    [ROCm][Engine] Fix GPU memory leaks in engine shutdown and test workaround for async KV prefix cache reset (#38503)

    Signed-off-by: Andreas Karatzas <akaratza@amd.com>

commit 07351e0883470724dd5a7e9730ed10e01fc99d08
Author: Chenguang Zheng <645327136@qq.com>
Date:   Sat Apr 25 11:57:41 2026 +0800

    [Feature] Warm up readonly multimodal processor during renderer startup (#40797)

    Signed-off-by: Chenguang ZHENG <645327136@qq.com>
    Co-authored-by: OpenAI Codex <codex@openai.com>

commit 428b988c98a0dee06c47d4a70858317b60169461
Author: Andreas Karatzas <akaratza@amd.com>
Date:   Fri Apr 24 21:59:31 2026 -0500

    [ROCm][CI] Fix `trust_remote_code` AttributeError in EAGLE3 acceptance length test (#40306)

    Signed-off-by: Andreas Karatzas <akaratza@amd.com>

commit e54894fc85a9861fb38a49701b5844462c3d77e4
Author: Andreas Karatzas <akaratza@amd.com>
Date:   Fri Apr 24 21:20:59 2026 -0500

    [ROCm][CI] Fix TestSiluMulGroupFp8QuantModel after W8A8 block linear refactor (#39799)

    Signed-off-by: Andreas Karatzas <akaratza@amd.com>

commit bc2ae5a3d6b59690b6a3312f0ed63842e8bc600b
Author: Angela Yi <angelayi@meta.com>
Date:   Fri Apr 24 17:59:20 2026 -0700

    [Test] Increase qwen2_vl num_logprobs to fix torch 2.12 update (#40818)

    Signed-off-by: Angela Yi <angelayi@meta.com>

commit a474da28131f61684849b31e29af0eebaaedc383
Author: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Date:   Fri Apr 24 19:28:18 2026 -0400

    [Refactor] Remove unused dead code (#40640)

    Signed-off-by: yewentao256 <zhyanwentao@126.com>

commit ce6a199ecc0996254efcf6fe532c40d9b9432922
Author: Lucas Kabela <lucaskabela@meta.com>
Date:   Fri Apr 24 16:25:03 2026 -0700

    [BE][Bugfix] Respect TORCH_COMPILE_DISABLE env var at the vLLM config level for torch 2.12 (#40715)

    Signed-off-by: Lucas Kabela <lucaskabela@meta.com>

commit f88763efc35f8da4d3cfe611a0497d3d3251b9e9
Author: Ignacio Sica <mignacio.sica@gmail.com>
Date:   Fri Apr 24 20:13:52 2026 -0300

    [Bugfix] add seq_lens_cpu_upper_bound to CommonAttentionMetadata in mla_runner.py (#40844)

    Signed-off-by: ignaciosica <mignacio.sica@gmail.com>

commit 333529deae59cd4100df540f225470c9bc539bee
Author: Artem Perevedentsev <aperevedents@nvidia.com>
Date:   Sat Apr 25 01:06:41 2026 +0300

    [EPLB] Fix replica selection bias in fused_moe router (#40810)

    Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

commit 88256082058fdbd41281c4f1f9a19663a4d7a668
Author: Zhang Jian <jianmusings@gmail.com>
Date:   Sat Apr 25 04:40:07 2026 +0800

    [Bugfix][CI] Fix wrong residual shape in TestFusedAddRMSNorm.example_inputs that causes flaky test (#40629)

    Signed-off-by: Zhang Jian <jianmusings@gmail.com>

commit 095d2f87e8519de27f1fc39d9d22b299efdf0010
Author: qli88 <qiang.li2@amd.com>
Date:   Fri Apr 24 14:54:40 2026 -0500

    [Bug] Fix GLM-5.1 running error on ROCm platform (#40763)

    Signed-off-by: Qiang Li <qiang.li2@amd.com>

commit 21792520e727676e4d4e8bd24a8fe29da4dab152
Author: Neil Schemenauer <nas-github@arctrix.com>
Date:   Fri Apr 24 10:24:05 2026 -0700

    [Build] Add Python 3.14 to supported version list. (#34770)

    Signed-off-by: Neil Schemenauer <nas@arctrix.com>
    Co-authored-by: Simon Mo <simon.mo@hey.com>

commit 5e11b403657ebd5507e07200c2ba2b8f186d07da
Author: Alex Brooks <albrooks@redhat.com>
Date:   Fri Apr 24 10:30:00 2026 -0600

    [Frontend] Delegate to vLLM Omni When `--omni` Passed (#40744)

    Signed-off-by: Alex Brooks <albrooks@redhat.com>

commit f768b4473e1bd55023dcaff63984cfdd08902fc8
Author: labAxiaoming <34019940+labAxiaoming@users.noreply.github.com>
Date:   Fri Apr 24 23:26:09 2026 +0800

    [Docs] Add docs for context extension using the yarn method (#37430)

    Signed-off-by: xiaoming <1259730330@qq.com>
    Signed-off-by: labAxiaoming <34019940+labAxiaoming@users.noreply.github.com>
    Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

commit 914d0464c1b1ec77d1560b485624f32002532b83
Author: JartX <sagformas@epdcenter.es>
Date:   Fri Apr 24 17:18:06 2026 +0200

    [Refactor] Unify 2D/3D kernels in triton_unified_attention (#40631)

    Signed-off-by: JartX <sagformas@epdcenter.es>

commit 9f771b3ab92d26a7d91a8255572c5d8d2b3ad601
Author: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Date:   Fri Apr 24 21:29:44 2026 +0800

    [Quantization] add humming quantization kernel (#34556)

commit c9d3c6e6af7fb848d3f03e256484f68a00201020
Author: Itay Alroy <75032521+itayalroy@users.noreply.github.com>
Date:   Fri Apr 24 16:05:31 2026 +0300

    fused_moe: treat NIXL EP as batched experts (#40412)

    Signed-off-by: Itay Alroy <ialroy@nvidia.com>

commit 51adca74e6be951c86e920046a83bfc061193ba2
Author: Or Ozeri <oro@il.ibm.com>
Date:   Fri Apr 24 15:32:29 2026 +0300

    [kv_offload+HMA][9/N]: Support lookup with multiple KV groups (#39401)

    Signed-off-by: Or Ozeri <oro@il.ibm.com>

commit e8eb0490ce098b1add05877363b185f3a7b570c5
Author: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Date:   Fri Apr 24 14:53:23 2026 +0300

    [Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949] (#40794)

    Signed-off-by: Netanel Haber <nhaber@nvidia.com>

commit e8ee2a78dbc08d398d5e798a149657b8aa821850
Author: Jiangyun Zhu <riverclouds.zhu@qq.com>
Date:   Fri Apr 24 19:25:55 2026 +0800

    [Attention] use diff kv backend for mimo v2 flash (#40045)

    Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

commit 2ec18f5df43e7f6c51e95125759904d39bd01630
Author: Thomas <153741656+thomasmaindron@users.noreply.github.com>
Date:   Fri Apr 24 13:01:56 2026 +0200

    [Bugfix][Parser] Fix Mistral tool parser for HF tokenizers (#39294)

    Signed-off-by: thomasmaindron <thomasmaindron@users.noreply.github.com>
    Co-authored-by: thomasmaindron <thomasmaindron@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
    Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

commit 6dec49f27ece339c59d5eb92f33120c11c0c3b74
Author: Dmitry Tokarev <dtokarev@nvidia.com>
Date:   Fri Apr 24 06:27:11 2026 -0400

    [Build] Bump CUDA to 13.0.2 to match PyTorch 2.11.0 (#40669)

    Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>

commit b5587e1013d0e352bb33c30b456d5221aebecd8c
Author: Shanshan Shen <467638484@qq.com>
Date:   Fri Apr 24 18:12:14 2026 +0800

    [CI/Build] Add e2e test for ViT CUDA graph (#40780)

    Signed-off-by: shen-shanshan <467638484@qq.com>

commit 9ad5abe7722ba4eb9cb484689dd90529e76c41c5
Author: milesial <milesial@users.noreply.github.com>
Date:   Fri Apr 24 02:18:55 2026 -0700

    Fix Nano Nemotron VL static image inputs (#40724)

    Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>

commit 7d3195ea9fc88e31131099d2d2122fe38558a87a
Author: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Date:   Fri Apr 24 01:40:20 2026 -0700

    [Bugfix] Fix IMA in DSA + MTP (#40772)

    Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

commit 512f52219240b0aa1be687955ab52fcdd0c5a40e
Author: Luciano Martins <lucianomartins@google.com>
Date:   Fri Apr 24 01:27:46 2026 -0700

    [Model] Gemma4: add bidirectional vision attention for sliding layers with window guard (#40534)

    Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
    Signed-off-by: Luciano Martins <lucianomartins@google.com>
    Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
    Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
    Co-authored-by: Isotr0py <2037008807@qq.com>
    Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

commit 4c34b2f6fc63435c791c9054c579ca3f8c902bb6
Author: Yuwen Zhou <yuwen.zhou@intel.com>
Date:   Fri Apr 24 16:26:16 2026 +0800

    [XPU] Enable torch.compile for XPU GDN attention (#39466)

    Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
    Signed-off-by: Yuwen Zhou <yuwen.zhou@intel.com>
    Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

commit cf8a613a87264183058801309868722f9013e101
Author: Xin Yang <105740670+xyang16@users.noreply.github.com>
Date:   Thu Apr 23 23:51:05 2026 -0700

    Support only half types for concat_mla_q kernel (#37892)

    Signed-off-by: Xin Yang <xyangx@amazon.com>

commit 01acf96c6f57914479e6bfe79d7bd5777a2fc49f
Author: xiangdong <40376367+zxd1997066@users.noreply.github.com>
Date:   Fri Apr 24 14:08:45 2026 +0800

    [XPU][CI] Fix Docker cleanup races on Intel CI runners (#40761)

    Signed-off-by: zengxian <xiangdong.zeng@intel.com>

commit 079a4cf399ad548d442fd92bfffbfbe460b6613…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants