Skip to content

[Mooncake] Add metrics for MooncakeStoreConnector operations#43392

Merged
ywang96 merged 7 commits into
vllm-project:mainfrom
Dao007forever:mooncake-store-metrics
May 23, 2026
Merged

[Mooncake] Add metrics for MooncakeStoreConnector operations#43392
ywang96 merged 7 commits into
vllm-project:mainfrom
Dao007forever:mooncake-store-metrics

Conversation

@Dao007forever
Copy link
Copy Markdown
Contributor

Summary

Adds per-operation telemetry for the store-pool variant of the Mooncake KV connector — the sibling MooncakeConnector (P2P) already exposes MooncakeKVConnectorStats, but the store connector had nothing equivalent (TODO at vllm/distributed/kv_transfer/kv_connector/v1/mooncake/stats.py:15).

Every Mooncake RPC (save_exists, save_put, load_get, lookup_exists) now records duration + key count + byte count + status (ok / partial_failure / error) + failed-key count. These flow through the existing KVConnectorLogging pipeline to the engine logger, and through a new MooncakeStorePromMetrics (1 histogram + 4 counters, labelled by operation and status) to Prometheus.

What changed

  • New vllm/distributed/kv_transfer/kv_connector/v1/mooncake/store/metrics.pyMooncakeStoreConnectorStats(KVConnectorStats) (per-op list[record] data model, aggregate / reduce for the logger) and MooncakeStorePromMetrics(KVConnectorPromMetrics).
  • store/worker.pyKVTransferThread accepts an optional record_operation callback; send/recv threads wrap batch_is_exist / batch_put_from_multi_buffers / batch_get_into_multi_buffers with timing; MooncakeStoreWorker owns a locked stats accumulator, exposes get_kv_connector_stats() (swap-and-return), and times the batch_is_exist call in lookup().
  • store/connector.py — adds get_kv_connector_stats() (instance) and the build_kv_connector_stats / build_prom_metrics classmethods that the metrics plumbing dispatches on.
  • Tests: 5 new in test_mooncake_store_worker.py (send-thread + recv-thread metric capture, aggregate/reduce, worker swap-and-reset, lookup); 2 new in test_mooncake_store_connector.py (worker delegation, stats reconstruction from a dict).

Notes

  • This repo's KVCacheStoreSendingThread calls batch_put_from_multi_buffers(keys, addrs, sizes, replicate_config) (4 positional args) — the metric wrap is around the existing call site unchanged.
  • The recv thread has an existing VLLM_MOONCAKE_STORE_TIER_LOG branch that calls batch_get_replica_desc before the data fetch; the load_get timer starts after that diagnostic call so the metric measures only what the user actually waits for.
  • Histogram buckets target Mooncake RPC latencies (1ms–4s) — sub-millisecond buckets dropped, longer-tail buckets added.

Why this is not a duplicate

Checked with:

gh pr list --repo vllm-project/vllm --state open --search "MooncakeStoreConnector in:body"
gh pr list --repo vllm-project/vllm --state open --search "MooncakeStorePromMetrics"
gh pr list --repo vllm-project/vllm --state open --search "mooncake_store_metrics"
gh issue list --repo vllm-project/vllm --state open --search "MooncakeStoreConnector metrics"

The 5 open MooncakeStoreConnector PRs (#42584 SupportsHMA, #42694 reset_cache, #42788 load failures, #43281 finish-after-preemption, #43371 save/load queue) are unrelated to telemetry. The mooncake/stats.py:15 TODO ("add MooncakePromMetrics ... in a follow-up PR") for the store connector is still open.

Test plan

  • pre-commit run --files <5 changed files> — all hooks pass (ruff check, ruff format, mypy-local, SPDX, etc.).
  • .venv/bin/python -m pytest tests/v1/kv_connector/unit/test_mooncake_store_worker.py tests/v1/kv_connector/unit/test_mooncake_store_connector.py — 52/52 in-scope tests pass.
    • 6 tests (test_requester_worker_init_*, test_topology_*) fail in my local env due to a pre-existing macOS sockaddr_un.sun_path 103-char limit in LookupKeyServer; unrelated to this change. These should run clean in CI / Linux.
  • End-to-end on a real Mooncake deployment: scrape /metrics, confirm vllm:mooncake_store_operation_time_seconds, vllm:mooncake_store_operation_total, vllm:mooncake_store_operation_keys_total, vllm:mooncake_store_operation_bytes_total, vllm:mooncake_store_operation_failed_keys_total series appear with labels for save_exists, save_put, load_get, and lookup_exists.

AI assistance disclosure

Per AGENTS.md: this PR was produced with AI assistance (Claude). I (the submitter) have reviewed every changed line. The accompanying tests cover the new metric recording paths (success, partial failure, aggregate/reduce, and swap-and-reset semantics).

@Dao007forever Dao007forever force-pushed the mooncake-store-metrics branch from 3de230c to b8271f1 Compare May 22, 2026 06:18
@zhewenl zhewenl added the ready ONLY add when PR is ready to merge/full CI is needed label May 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces per-operation telemetry for the MooncakeStoreConnector, enabling the tracking of duration, key/byte counts, and status for RPC operations such as save_exists, save_put, load_get, and lookup_exists. The implementation includes a new metrics module for data aggregation and Prometheus integration, instrumentation of worker threads, and comprehensive unit tests. Feedback identifies a potential AttributeError in the Prometheus observe method when stats are missing and a timing logic error in the receiving thread that could result in incorrect duration reporting during failures.

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/mooncake/store/metrics.py Outdated
Comment thread vllm/distributed/kv_transfer/kv_connector/v1/mooncake/store/worker.py Outdated
Dao007forever and others added 4 commits May 21, 2026 23:31
Adds per-operation telemetry (save_exists, save_put, load_get,
lookup_exists) to the store-pool variant of the Mooncake KV connector.
Each call records duration, key count, byte count, status
(ok/partial_failure/error), and failed-key count into a new
MooncakeStoreConnectorStats serialized to the engine logger, plus a
MooncakeStorePromMetrics histogram+counters exposed via Prometheus,
labelled by (operation, status). This closes the TODO at
vllm/distributed/kv_transfer/kv_connector/v1/mooncake/stats.py:15 for
the store connector, mirroring ivanium#35.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
- metrics.py: guard MooncakeStorePromMetrics.observe against None /
  empty transfer_stats_data; widen the parameter type to match.
- worker.py: in KVCacheStoreRecvingThread._handle_request, anchor
  load_get_start at the top of each batch iteration so an exception
  inside the tier-log lookup attributes time to *this* batch (not the
  previous one), and re-anchor right before the RPC so success-path
  durations exclude the tier lookup.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
@Dao007forever Dao007forever force-pushed the mooncake-store-metrics branch from 012b96e to 2cfe45a Compare May 22, 2026 06:31
Copy link
Copy Markdown
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/mooncake/store/metrics.py Outdated
…trics.py

Co-authored-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Dao007forever <dao007forever@gmail.com>
@ywang96 ywang96 merged commit 819c610 into vllm-project:main May 23, 2026
64 of 65 checks passed
lrioxh pushed a commit to lrioxh/vllm-dev that referenced this pull request May 24, 2026
Liuweixiong0118 pushed a commit to Liuweixiong0118/vllm that referenced this pull request Jun 1, 2026
hynky1999 added a commit to macrodata-labs/vllm that referenced this pull request Jun 2, 2026
* [MM] Enable FlashInfer metadata support for Qwen2.5-VL vision attention (#42787)

Signed-off-by: Hua Huang <huah@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [Docs] Fix stale version number in token_embed.md (#43488)

Signed-off-by: holegots <ikun3.1415927@gmail.com>

* [Docs] Fix stale version number in token_classify.md (#43489)

Signed-off-by: holegots <ikun3.1415927@gmail.com>

* [MoE] Migrate W4A8 CT to oracle kernel setup (#42680)

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>

* [Mooncake] Add metrics for MooncakeStoreConnector operations (#43392)

* [ROCm][Critical] Fix the GDN import bug (#43486)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* Revert "[Misc] add humming to dependencies" (#43492)

* [Bugfix] Fix reasoning dropped on streaming boundary deltas (#42691)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [Model Runner v2] Force v1 runner for tests (#43233)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [KV Connector] Keep MooncakeStore full hits block-aligned (#43494)

Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [kv_offload]: Add DSv4 support (#43142)

Signed-off-by: Or Ozeri <oro@il.ibm.com>

* [ROCm][CI] Stabilize 400 error return code for invalid schema inputs (#43016)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [ROCm] [DSv4] [Perf] Support DeepSeek v4 MTP (#43385)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* Tuning script and configs for Triton Mamba SSU kernel (#43083)

Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>

* File system secondary tier implemented in python (#41735)

Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>

* [Kernel] Add mhc_pre_big_fuse_with_norm_tilelang  (#43474)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* fix: MoE model using shared routed experts crashes on AMD GPUs (#42373)

Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

* [Docs] Reorganize offline inference docs.  (#43552)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)

Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [Feat][KVConnector] Support DSV4 in SimpleCPUOffloadBackend (#42296)

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

* [Doc] Add section on escalating stalled contributions (#43568)

Signed-off-by: esmeetu <jasonailu87@gmail.com>

* Reduce memory usage for granite_speech. (#42933)

Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [KV Connector] Handle Mooncake finish after preemption (#43281)

Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>

* [Misc] Print accuracy value for PD tests even on success  (#43583)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Kernel] Remove NormGateLinear (#43554)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [XPU] Ensure RNG offset alignment with PyTorch requirements in XPU sampler (#43028)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [LoRA] Add one shot triton kernel For MoE LoRA (#42290)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [DeepSeek V4] Move MegaMoE input prep kernel to nvidia/ops (#43632)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [KV Connector][Bugfix] MooncakeStore: don't double-apply Eagle prune in load_mask (#43516)

Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [KV Connector] Propagate MooncakeStore load failures (#42788)

Signed-off-by: Dao Le <Dao007forever@gmail.com>

* [Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler (#43194)

Signed-off-by: Yan Ma <yan.ma@intel.com>

* [Frontend] Split the offline inference APIs and utils. (#43553)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [Bugfix][Model] Fix GPT2ForSequenceClassification sub-module prefix (#43579)

Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [GDN] GDN Prefill kernel for SM100 (#43273)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [CPU] Enable non-divisible GQA for decode workitems in mixed batches (#43032)

Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>

* Upgrade tpu-inference to v0.20.0 (#43394)

* Add CuTe DSL sparse compressor support (#43584)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>

* [chores][log] change registry log from `warning` to `debug` (#43045)

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [Bugfix] Apply fc_norm in Eagle3DeepseekV2 combine_hidden_states (#43482)

Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [KV Transfer] Enable HMA by default for connectors that support it (#41847)

Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>

* [Misc][Refactor][ROCm] Convert MoRI-related envvars to extra config args (#43303)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

* [Misc] Support interleaved custom image benchmark datasets (#43636)

Signed-off-by: ThibaultCastells <thib.castells@icloud.com>

* [Reasoning] [Bugfix] Reject invalid thinking_token_budget values (#43402)

Signed-off-by: linzm1007 <linzm1007@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Model] Use AutoWeightsLoader for InternLM2 (#38278)

Signed-off-by: Jesus De Jesus <dejesus.9297@gmail.com>
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [XPU] Fix fused MoE LoRA kernel crash on XPU by using platform-agnos num_compute_units (#43646)

Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>

* Fix CuPy runtime deps and restore humming (#43530)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Docs][ROCm] MoRI-IO Connector Usage Guide (#43603)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ROCm][CI] Extend ROCm quick reduce coverage (#40990)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162)

* [MoE Refactor] Migrate ModelOptMxFp8FusedMoE to oracle (#42768)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [MoE Refactor] W4a8 int8 oracle (#42789)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [ROCm] Remove MegaMoE integration in deepseek v4 (#43629)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Add LM head quantization support for ModelOpt (#42124)

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

* [Doc] Add line limit to AGENTS.md (#43635)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

* [DSv4] Drop _get_compressed_kv_buffer in DeepseekCompressor (#43690)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [CI] Soft-fail AMD entrypoints mirror tests (#43709)

Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Kernel] Porting  fuse_minimax_qk_norm  to manual fusion (#43410)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [KV Connector] MooncakeStore: drop dead discard_partial_chunks parameter (#43627)

Signed-off-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Bugfix][V1] Fix TOCTOU race causing intermittent `EADDRINUSE` on multi-API-server DP startup (#42585)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ci] Add arm64 ci image (#41303)

Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Bugfix] Split attention groups by num_heads_q for spec-decode drafts (#43543)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [Rust Frontend] Add reasoning/tool parser & renderer roundtrip tests (#43582)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [ROCm][CI] Fix ROCm multimodal Qwen2.5-VL activation compile and Phi4MM ragged image mask handling (#43647)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Perf] Optimize Fp8BlockScaledMMLinearKernel input_scale tensor using new_empty() (#43677)

Signed-off-by: Xin Yang <xyangx@amazon.com>

* [Attention] Make FlexAttention and FlashAttention use num-blocks first layouts (#42095)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

* [MLA][Attention] Add OOT MLA prefill backend registration mechanism (#43325)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

* [Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [DSv4] Refactor compressor & Fix ROCm compatibility (#43710)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Fix test_aot_compile for torch 2.12 (#43695)

Signed-off-by: Angela Yi <yiangela7@gmail.com>

* [KVConnector][Mooncake] Wire reset_cache cascade end-to-end (#42694)

Signed-off-by: aoshen524 <aoshen524@gmail.com>
Signed-off-by: Ao Shen <aoshen@inferact.ai>
Co-authored-by: aoshen524 <aoshen524@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [ROCm][Perf] Expose AITER MoE sorting dispatch policy via env var (#39177)

Signed-off-by: nholmber <nholmber@users.noreply.github.com>

* [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [Frontend] Add MiniCPM5 XML tool call parser (#43175)

Signed-off-by: zhangtao <zhangtao2@modelbest.cn>
Signed-off-by: zhangtao2 <zhangtao2@modelbest.cn>
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [ROCm][GPT-OSS] Avoid repeated compile-time `cos_sin_cache.to(bf16)` casts in rotary path (#42833)

Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>

* [Doc] Add Ascend NPU tab to the quickstart installation guide (#43550)

Signed-off-by: Aditya Singh <adisin650@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Rust Frontend] Align tool parser fallback behavior between streaming & non-streaming paths (#43662)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Docs] Fix MLA prefill backend default docs (#43697)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Kernel] Enable TritonW4A16LinearKernel as CUDA fallback for non-Marlin-aligned W4A16 shapes (#43731)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [misc] Bump cutedsl version to 4.5.2 (#43745)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1 (#39155)

Signed-off-by: Injae Ryou <injaeryou@gmail.com>

* [Docs] Fix the duplicate doc icon issue (#43546)

Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>

* Fix early CUDA init (#43791)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [ROCm] mori: add InterNodeV1LL inter-node kernel selection via VLLM_MORI_INTERNODE_KERNEL (#41751)

Signed-off-by: jatseng-ai <jatseng@amd.com>

* [8/n] Migrate merge_attn_states, mamba, sampler to torch stable ABI (continued) (#43361)

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [Quantization] Fix Humming RoutedExperts import (#43540)

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* Remove Transformers forward/backward compatibility tests (#43785)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Validate against some config fields being set to 0 (#43794)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix][DFlash]allocate the proper number of lookahead slots (#43733)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>

* Fix Qwen3-VL and Qwen3-omni-thinker accuracy degradation from deepstack inputs under torch.compile (#43617)

Signed-off-by: Dakai An <dakaian108@gmail.com>

* Add @AndreasKaratzas to CODEOWNERS (#43740)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Bugfix][Kernel] TRTLLM NVFP4 MoE chunking (#43599)

Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

* [ModelRunnerV2][Hybrid model] Support kernel block size in hybrid model (#38831)

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Rust Frontend] Introduce mock engine for benchmark baseline (#43469)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* Fix RunAI streamer tensor buffer reuse during weight loading (#43464)

Signed-off-by: bbartels <benjamin@bartels.dev>

* [MoE] Remove inplace fused experts mechanism (#43727)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [Misc][Rocm] Remove redundant `AiterUnifiedAttentionBackend` block size log (#43664)

Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [ROCm][CI] Stabilize Cargo cache and pre-test image checks (#43815)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* fix: parse Qwen3 XML JSON arguments first (#43243)

Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>

* [Bugfix] Pass `routed_scaling_factor` to FlashInfer TRTLLM BF16 MoE (#43769)

* [BugFix] Fix blocked reasoning parsing with MRV2 (#43808)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Bugfix][Frontend] streaming tool-call serializer drops first args chunk when name and args share a DeltaMessage  (#42683)

Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>

* minor docs: fix incorrect example path (#43830)

Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>

* [ROCm][DSV4] Enable Tilelang MHC replacing torch/triton mhc (#43679)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* change name of fs_python secondary tier to fs. (#43600)

Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>

* [BugFix] Fix hard-coded timeout for multi-API-server startup (#43768)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Kernel] Marlin MoE: include SM 12.x in default arch list (#40923)

Signed-off-by: Tony Liu <tonyliu0512@gmail.com>
Co-authored-by: Tony Liu <tonyliu0512@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [DSV4] Remove AMD/XPU path in deepseek_v4/nvidia (#43829)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Restore `Literal` for `WeightTransferConfig.backend` (#43183)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix] Stream DeepSeek DSML tool-call argument deltas incrementally (#42879)

Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [ROCm][CI] Move workload from MI300 to MI325 (#43824)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Feature] Add support for timed trace replay in `vllm bench serve` to replay Moonshot and Alibaba workload traces (#39795)

Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>

* [UX] Increase DP Coordinator startup timeout from 30s to 120s (#42343)

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

* [Model][Bugfix] Rename weight_mapper to hf_to_vllm_mapper in LlamaNemotronVL pooling models (#43581)

Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: opencode <noreply@opencode.ai>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>

* [Bugfix][ROCm] Fix Accuracy Drop in Sparse Indexer on gfx950 (#43781)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

* [Bugfix] Fix HyperCLOVAX CI failure after upstream removed remote code (#43860)

Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [CI] Auto-apply `rust` label to relevant PRs (#43866)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Feature] Add structured output and effort support to Anthropic Messages API (#42396)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* Log dummy DP step in iteration details (#41406)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>

* [EC Connector] Add shutdown API to EC Connector. (#42423)

Signed-off-by: omerpaz95 <omerpaz95@gmail.com>

* Fix `OlmoHybridForCausalLM` not initialising (#43846)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [BUGFIX] Multimodal benchmark with MistralTokenizer (#42965)

Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* [Perf] Optimize moe permute by pre-allocate buffer, 9~14% kernel performance improvement (#43014)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Perf][KDA] Fuse gate softplus, chunk-local cumsum, and RCP_LN2 scaling (#43667)

Signed-off-by: haojiangzheng <justineric096@gmail.com>
Co-authored-by: haojiangzheng <justineric096@gmail.com>

* Add token-offset based selective offload in OffloadConnector (#39983)

Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>

* [Model Refactoring] Remove torch compile dependency in DSv4 (#43746)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Bugfix][ROCm] Resolve MoRI connector hangs at high concurrency (#40344)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

* [CPU] Migrate cpu_awq into awq_marlin (#43841)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Rust Frontend] Add `hy_v3` tool parser (#43872)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Rust Frontend] Reduce Gemma4 tool parser args scan complexity (#43850)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [rust] fix: aggregate `is_sleeping` and `reset_prefix_cache` across DP engines (#43429)

Signed-off-by: Will.hou <1205157517@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Bug] Fix `tests/distributed/test_elastic_ep.py  - assert False` (#43813)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Perf] Add do_not_specialize to Mamba SSD chunk kernels (#43803)

Signed-off-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: Majid Taheri Andani <tahemaji@amazon.com>

* [Bugfix] Exclude Ray DP from #42585's deferred port allocation (#43864)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

* [KV Offload] Rename `SecondaryTierManager.get_finished()` to `get_finished_jobs()` (#43870)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

* [ROCm][Perf] Support N=5 in wvSplitK skinny GEMM kernels for speculative decoding (#40687)

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

* [XPU][MoE] Add WNA16 oracle backend for GPTQ sym-int4 (xpu_fused_moe) (#41426)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [ROCm] Bump ROCm to 7.2.3 (#43136)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

* Add Cosmos3 Reasoner model (#43356)

Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>

* [Rust Frontend] Optimize multimodal prompt expansion (#43670)

Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>

* Allow native KV cache dtype in Triton cache update (#43330)

Signed-off-by: Michael Gschwind <mgschwind@nvidia.com>
Co-authored-by: Michael Gschwind <mgschwind@nvidia.com>

* [Attention][AMD] Standardize kv layout to blocks first for AMD (#43660)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [ROCm] Enable the aiter top-k/top-p sampler by default (#43331)

Signed-off-by: John Qin <yanyuan.qin@amd.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

* [MM][CG] Avoid over-padding Qwen2.5-VL encoder cudagraph window metadata (#42796)

Signed-off-by: Hua Huang <huah@nvidia.com>

* Deprecate `JAISLMHeadModel` (#43784)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Feat] Add support for per GPU worker RDMA NIC selection (#42083)

Signed-off-by: Raj Joshi <rajjoshi@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* [Core] Cleanup KVConnector handling with PP + fix MRV2  (#43732)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [KV Offload] Add per-request offloading policy via `on_new_request` lifecycle hook (#43205)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Model Refactoring] Remove unncessary torch op registration for DSv4 (#43891)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Spec Decode] Allow causal DFlash (#43445)

* Refactor output filename handling in ci-fetch-log.sh (#43901)

Signed-off-by: Michael Goin <mgoin64@gmail.com>

* [AMD][CI][BugFix] Fix  Distributed Compile Unit Tests (2xH100-2xMI300) group (#43120)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>

* fix(frontend): Add multimodal placeholders to Gemma4 tool message template (#41459)

Signed-off-by: Harshal Janjani <harshaljanjani@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>

* [CI] Enable prefix caching in BFCL benchmark (#43925)

Signed-off-by: Yifan Zong <yzong@redhat.com>

* [Model]Support Step-3.7-Flash (#43859)

Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>

* [Rust Frontend] Add `/version` endpoint using engine-reported value (#43854)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Misc][NUMA] Auto-bind to PCT priority cores on DGX B300 + widen EngineCore across shard NUMA nodes (#43270)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Co-authored-by: Cursor <noreply@cursor.com>

* [DSv4] Move mHC tilelang kernels & Don't use CustomOP in dsv4/nvidia (#43905)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [feat] add GlmgaProcessor specific logits in `glm4_1v.py` (#43575)

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>

* Adjust design around encoder_cudagraph_forward (#42288)

Signed-off-by: Weida Hong <wdhongtw@google.com>

* [XPU] add scale transpose to prepare_fp8_moe_layer_for_xpu and bump up kernels (#43277)

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [kv_offload] Skip decode-phase blocks in CPU offload (#43797)

Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>

* [Refactor] Remove dead code (#43234)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [9/n] Migrate attention and cache kernels to torch stable ABI (continued)  (#43717)

Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [CI] Separate non-root smoke tests from image build step (#43712)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [XPU] add gelu_tanh to xpu moe backend supported activations (#42822)

Signed-off-by: yintong-lu <yintong.lu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [CPU Backend] CPU top-k and top-p sampling kernels using Triton (#43633)

Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [ROCm][DSv4] Remove device pipeline stall in sparse attention (#43898)

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

* [Frontend]Responses API supports chat_template_kwargs (#43761)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>

* [ROCm][CI] Fix AITER unified attention for encoder-decoder cross-attention (#43945)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [XPU] fix xpu install document triton-xpu version (#43947)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* [CI][ROCm] Don't skip MoRI-IO Connector tests (#43703)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

* [XPU] support MTP of gdn attention (#43565)

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [CI] Nixl+SimpleCPUOffloadingConnector unit tests (#43871)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Bugfix] Fix Step3 pipeline parallel KeyError for residual tensor (#37622)

Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

* [Kernel][ROCm] Native W4A16 kernel for AMD RDNA3 (gfx1100) — fp16 + bf16 (#41394)

Signed-off-by: JartX <sagformas@epdcenter.es>

* [Bugfix] [ROCm] [DSV4] Fix AITER MXFP4 MoE weight loading and shuffle… (#42595)

Co-authored-by: MHYangAMD <MHYangAMD@users.noreply.github.com>

* [ROCm][Perf] DSv3.2 MI355X TP4 decode-step orchestration cleanup (3 micro-opts) (#42982)

Signed-off-by: Frida Andersson <fanderss@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* [Bugfix] Corrupted MLA + linear attention (#43961)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* Skip docs build if PR doesn't affect docs (#43972)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Bugfix][CPU] Remove invalid extra deps (#43977)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* Add vLLM library info to Hugging Face Hub requests (#43857)

Signed-off-by: Wauplin <lucainp@gmail.com>
Signed-off-by: Lucain Pouget <lucain@huggingface.co>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* docs: clarify ITL acronym in optimization docs (#43922)

Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>

* [Misc] added unit tests for the core pooling methods (#43818)

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [Bugfix] Disable allreduce_rms_fusion when pipeline_parallel_size > 1 (#43616)

Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude <noreply@anthropic.com>

* [MoE Refactor] WNA16 MoE backend selection into oracle module (#42553)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [EPLB] Make async EPLB default (#43219)

Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>

* [Bugfix] Use storage_block_size in KV cache reshape for compressed specs (DeepSeek V4) (#43988)

Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* [Bugfix] Fix Ray placement group allocation with grouped nodes (#43998)

Signed-off-by: <conway.zhu@cohere.com>
Signed-off-by: root <conway.zhu@cohere.com>

* [Bug] Fix torch device issue for MOE permute (#44005)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [CI] Make Model Executor test hangs fail fast with a traceback (#43971)

Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [CI] Remove redundant test_chat_with_tool_reasoning.py (#44011)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* Add @khluu to CODEOWNERS (#44019)

Signed-off-by: Kevin H. Luu <khluu000@gmail.com>

* [Feature] SSL support for dp supervisor (#43688)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Metrics] Exclude KV transfer tokens from iteration_tokens_total (#43346)

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Fronten] Clean up stop_token_ids override for Harmony (#44009)

Signed-off-by: Yifan Zong <yzong@redhat.com>

* [MoE Refactor] Migrate MoeWNA16Method quantization to MK oracle (#42647)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [MoE Refactor] Remove supports_expert_map (#43108)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [CI] Remove duplicate Harmony test coverage (#44023)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [CI] Fix smoke test step key to bypass block gate (#43974)

Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Revert "[MoE Refactor] Migrate MoeWNA16Method quantization to MK orac… (#44033)

Signed-off-by: Bill Nell <bnell@redhat.com>

* [PERF]MiniMax-M2 gate kernel (#38445)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>

* offload prompt_embeds decode in render_prompts_async to avoid blocking (#43792)

Signed-off-by: Gagan Dhakrey <gagandhakrey@gmail.com>

* [Refactor] Remove dead current_tool_name_sent assignments from tool parsers (#43997)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [ROCm][CI] Fix failure in the Phi3V pooling test (#44028)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [ROCm] cmake: support PYTORCH_FOUND_HIP for torch 2.13 native HIP language support (#43881)

Signed-off-by: nemanjaudovic <nudovic@amd.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [BugFix][Platform] Fix import vllm.platforms.rocm error on non-CUDA test_gpt_oss.py (#43571)

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [Bugfix] Fix RMSNorm kernels to multiply in weight's native dtype (#42379)

Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [ROCm] Add attention sink support to AITer flash attention backend (#43817)

Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>

* [Governance] Add @BugenZhao as Rust frontend code owner (#44047)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Bug] Fix gemma4 MTP IMA issue when TP>1, `CUDA error: an illegal memory access was encountered` (#43909)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [MRV2] Support breakable CUDA graph (#44050)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [CPU][Zen] Route W8A8 and W4A16 linear inference through zentorch on AMD Zen CPUs (#41813)

Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>

* [CI/Build] Enable Step3p7ForConditionalGeneration testing (#43956)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* docs: fix MLA attention docstring examples (#44118)

Co-authored-by: nightcityblade <nightcityblade@gmail.com>

* [Misc] Use VLLMValidationError consistently in chat completion and completion protocol validators (#36254)

Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>

* [MRV2] Remove Eagle's dedicated CUDA graph pool (#44078)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>

* [BugFix] Fix `_has_module` to verify native deps via trial import (#44035)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Docs] Replace broken video url in examples (#44159)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [CPU][RISC-V] Add missing RVV cpu_types helpers for WNA16 (#42730)

Signed-off-by: wcy <233313160abc@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>

* fix: glm5.1 pp model loading (#42944)

Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>

* [Frontend] Resettle generative scoring entrypoint. (#44153)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

* [Rust Frontend] Add InternLM2 tool parser (#43481)

Signed-off-by: Will.hou <1205157517@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>

* [Bugfix] fix wrong partial_rotary_factor calculation for bailing_moe model. (#43770)

Signed-off-by: zzt <zengzetang.zzt@antgroup.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>

* [XPU][CI] Fix test_audio_in_video flake by using module-scoped server fixture (#44146)

Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>

* [Perf] Optimize cutlass fp8 scaled mm bypassing padding, 20% kernel performance improvement (#43706)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Feature] Add support for JetBrains' Mellum v2 code generation model (#43992)

Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [Kernel][DSv4] Optimize sparse FP8 compressor kernels (#44161)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [ROCm][CI] Fix and stabilize EAGLE3 acceptance tests (#41294)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>

* [Rust Frontend] Support streaming `generate` endpoint (#43779)

Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>

* [Frontend][Core] Add sparse NCCL weight transfer support for in-place updates (#40096)

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>

* [BugFix][CI] Fix added `_has_module` tests (#44248)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Test][BugFix] Fix double-BOS in PD+specdec acceptance test (#44234)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [DSV4] Remove unncessary classes & functions (#44246)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [ROCm][CI] Skip unbacked dynamic shapes tests on PyTorch < 2.11 (#44256)

Signed-off-by: JartX <sagformas@epdcenter.es>

* [DSV4] Refactor RoPE initialization (#44262)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Bugfix][Mooncake] Release GPU pin on failed store in MooncakeStoreConnector (#43742)

Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [ROCm] Upgrade AITER to v0.1.13.post1 (#44265)

Signed-off-by: Micah Williamson <micah.williamson@amd.com>

* [Bugfix][CI] Normalize NIXL connector CUDA wheel installs (#44266)

Signed-off-by: Alec Flowers <aflowers@nvidia.com>

* [Refactor] Move unstreamed tool-arg flush from serving layer to parser (#44017)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [CI] Stabilize OpenAI schema fuzzing for malformed structural tags (#44131)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [BugFix] Fix TypeError in MiniCPM-O audio feature unpadding (#38053)

Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Signed-off-by: wjinxu <1299461899@qq.com>
Signed-off-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Co-authored-by: wjinxu <1299461899@qq.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Kc Balusu <kcbalusu@users.noreply.github.com>

* [BugFix][kv_offload]: Prevent offloading stale sliding window blocks (#42959)

Signed-off-by: Or Ozeri <oro@il.ibm.com>

* [XPU][Bugfix] Fix per_token_group_fp8_quant missing dummy args on XPU (#43930)

Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [MM][CG] Profile encoder CUDA graph pool memory (#41714)

Signed-off-by: JooHo Lee <jooho414@gmail.com>

* [Bugfix] Convert Gemma4-MM ViT linear layers to vllm native impl (#43798)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>

* [Model Runner V2] Support zeroing freshly allocated KV blocks for hybrid + fp8 KVCache (#43990)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

* [Model Runner V2] Use actual batch max_seq_len for attn metadata (#43991)

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

* [Refactor] Unify reasoning + tool-call parsing behind Parser.parse() (#44267)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

---------

Signed-off-by: Hua Huang <huah@nvidia.com>
Signed-off-by: holegots <ikun3.1415927@gmail.com>
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: ThibaultCastells <thib.castells@icloud.com>
Signed-off-by: linzm1007 <linzm1007@126.com>
Signed-off-by: Jesus De Jesus <dejesus.9297@gmail.com>
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Kevin Luu <kevin@inferact.ai>
Signed-off-by: Zhewen Li <zhewen@inferact.ai>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Angela Yi <yiangela7@gmail.com>
Signed-off-by: aoshen524 <aoshen524@gmail.com>
Signed-off-by: Ao Shen <aoshen@inferact.ai>
Signed-off-by: nholmber <nholmber@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: zhangtao <zhangtao2@modelbest.cn>
Signed-off-by: zhangtao2 <zhangtao2@modelbest.cn>
Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
Signed-off-by: Aditya Singh <adisin650@gmail.com>
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: jatseng-ai <jatseng@amd.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Signed-off-by: Dakai An <dakaian108@gmail.com>
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Signed-off-by: JINO-ROHIT <find.jinorohit@gmail.com>
Signed-off-by: Tony Liu <tonyliu0512@gmail.com>
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Signed-off-by: Animesh Trivedi <Animesh.Trivedi@ibm.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: omerpaz95 <omerpaz95@gmail.com>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: haojiangzheng <justineric096@gmail.com>
Signed-off-by: Angelo Ruocco <ang@zurich.ibm.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Will.hou <1205157517@qq.com>
Signed-off-by: Majid Taheri Andani <tahemaji@amazon.com>
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Maciej Bala <mbala@nvidia.com>
Signed-off-by: MaciejBalaNV <mbala@nvidia.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Michael Gschwind <mgschwind@nvidia.com>
Signed-off-by: John Qin <yanyuan.qin@amd.com>
Signed-off-by: Raj Joshi <rajjoshi@redhat.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Harshal Janjani <harshaljanjani@gmail.com>
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Weida Hong <wdhongtw@google.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Signed-off-by: yintong-lu <yintong.lu@intel.com>
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Frida Andersson <fanderss@amd.com>
Signed-off-by: Wauplin <lucainp@gmail.com>
Signed-off-by: Lucain Pouget <lucain@huggingface.co>
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: zixi-qi <zixi@inferact.ai>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: <conway.zhu@cohere.com>
Signed-off-by: root <conway.zhu@cohere.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: qianlihuang <91178480+qianlihuang@users.noreply.github.com>
Signed-off-by: Gagan Dhakrey <gagandhakrey@gmail.com>
Signed-off-by: nemanjaudovic <nudovic@amd.com>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Signed-off-by: Xiaoran Chen <xiaoran@fb.com>
Signed-off-by: R <Ganesh.R@amd.com>
Signed-off-by: Harshal Adhav <harshal.adhav@amd.com>
Signed-off-by: Aakar Dwivedi <aadwived@amd.com>
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
Signed-off-by: wcy <233313160abc@gmail.com>
Signed-off-by: UranusSeven <109661872+UranusSeven@users.noreply.github.com>
Signed-off-by: zzt <zengzetang.zzt@antgroup.com>
Signed-off-by: Madeesh Kannan <madeeswaran.kannan@jetbrains.com>
Signed-off-by: xunzhuo <xunzhuo@vllm-semantic-router.ai>
Signed-off-by: Alec Flowers <aflowers@nvidia.com>
Signed-off-by: Krishna Chaitanya Balusu <krishnabkc15@gmail.com>
Signed-off-by: wjinxu <1299461899@qq.com>
Signed-off-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Signed-off-by: JooHo Lee <jooho414@gmail.com>
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Signed-off-by: Hynek Kydlicek <kydlicek.hynek@gmail.com>
Co-authored-by: Hua Huang <huangh1994@outlook.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Holegots <fuergaosi@gmail.com>
Co-authored-by: Siddharth Bedekar <104613085+bedeks@users.noreply.github.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Dao007forever <dao007forever@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>
Co-authored-by: Rotem Shavitt <rshavitt@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Nguyễn Thế Duy <dtnguyen@nvidia.com>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Roy Wang <jasonailu87@gmail.com>
Co-authored-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Zhewen Li <zhewenli@meta.com>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Huanyu Yang <20242081160@mail.dlut.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: zhao, zhenhui <zhenhui.zhao@intel.com>
Co-authored-by: Sting Lin <sting.lin@cienet.com>
Co-authored-by: Jie Fang <jief@nvidia.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Hank_ <37239608+ILikeIneine@users.noreply.github.com>
Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Ethan Feng <ethan.fengch@gmail.com>
Co-authored-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: Thibault Castells <38716394+ThibaultCastells@users.noreply.github.com>
Co-authored-by: linzm1007 <96732179+linzm1007@users.noreply.github.com>
Co-authored-by: Javier De Jesus <javier.dejesusj9@gmail.com>
Co-authored-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Luciano Martins <22145370+lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Angela Yi <yiangela7@gmail.com>
Co-authored-by: aoshen02 <aoshen@inferact.ai>
Co-authored-by: aoshen524 <aoshen524@gmail.com>
Co-authored-by: Nico Holmberg <nico.holmberg@amd.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: zhangtao2-1 <478679312@qq.com>
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: akii96 <aakif.nawaz@amd.com>
Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com>
Co-authored-by: Ashwin Giridharan <ashwing@users.noreply.github.com>
Co-authored-by: Injae Ryou <injaeryou@gmail.com>
Co-authored-by: Chunyang Wen <chunyang.wen@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: jatseng-ai <jatseng@amd.com>
Co-authored-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Minh Vu <vuhoangminh97@gmail.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Co-authored-by: Dakai An <77474977+andakai@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Co-authored-by: Ignacio Sica <mignacio.sica@gmail.com>
Co-authored-by: JINO ROHIT <find.jinorohit@gmail.com>
Co-authored-by: tonyliu312 <56969792@qq.com>
Co-authored-by: Tony Liu <tonyliu0512@gmail.com>
Co-authored-by: jack <QwertyJack@users.noreply.github.com>
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Co-authored-by: Animesh Trivedi <animesh.trivedi@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Co-authored-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: opencode <noreply@opencode.ai>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: kliuae <17350011+kliuae@users.noreply.github.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: omerpaz95 <73347585+omerpaz95@users.noreply.github.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: zexplorerhj <zhjoneson@163.com>
Co-authored-by: haojiangzheng <justineric096@gmail.com>
Co-authored-by: Angelo Ruocco <angeloruocco90@gmail.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Will.hou <1205157517@qq.com>
Co-authored-by: Majid <mjtaheri68@gmail.com>
Co-authored-by: Majid Taheri Andani <tahemaji@amazon.com>
Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Jason Elie Bou Kheir <5115126+jasonboukheir@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: MaciejBalaNV <mbala@nvidia.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Chao-Ju Chen <ricky.chen@infinirc.com>
Co-authored-by: Mike G <180722391+mikekg@users.noreply.github.com>
Co-authored-by: Michael Gschwind <mgschwind@nvidia.com>
Co-authored-by: JohnQinAMD <yanyuan.qin@amd.com>
Co-authored-by: Hua Huang <huah@nvidia.com>
Co-authored-by: Raj Joshi <rajjoshi@g.harvard.edu>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Harshal Janjani <harshaljanjani@gmail.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: yzong-rh <yzong@redhat.com>
Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Yu Huang <yuhuang@nvidia.com>
Co-authored-by: Jee Jee Li <jeejeelee@inferact.ai>
Co-authored-by: Cursor <noreply@cursor.com>
Co-authored-by: Jared Wen <w13431838023@gmail.com>
Co-authored-by: Weida Hong <wdhongtw@google.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
Co-authored-by: Itay Etelis <92247226+Etelis@users.noreply.github.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Yintong Lu <yintong.lu@intel.com>
Co-authored-by: Tianmu Li <tianmu.li@intel.com>
Co-authored-by: Joaquín Mondéjar <111321569+JMonde@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: MHYangAMD <meng-hsuan.yang@amd.com>
Co-authored-by: MHYangAMD <MHYangAMD@users.noreply.github.com>
Co-authored-by: frida-andersson <fanderss@amd.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
Co-authored-by: Ilya Markov <markovilya197@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: czhu-cohere <conway.zhu@cohere.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Yiliu Dong <91178480+qianlihuang@users.noreply.github.com>
Co-authored-by: Gagan Dhakrey <59848316+gagandhakrey@users.noreply.github.com>
Co-authored-by: nemanjaudovic <152565955+nemanjaudovic@users.noreply.github.com>
Co-authored-by: Liangliang Ma <liangliang.ma@intel.com>
Co-authored-by: Lanze Liu <86434077+liulanze@users.noreply.github.com>
Co-authored-by: Xiaoran <claire.rrchen@hotmail.com>
Co-authored-by: Xiaoran Chen <xiaoran@fb.com>
Co-authored-by: Aakar Dwivedi <82587125+aadwived@users.noreply.github.com>
Co-authored-by: R <Ganesh.R@amd.com>
Co-authored-by: Harshal Adhav <harshal.adhav@amd.com>
Co-authored-by: nightcityblade <jackchen@haloailabs.com>
Co-authored-by: nightcityblade <nightcityblade@gmail.com>
Co-authored-by: Umut Polat <52835619+umut-polat@users.noreply.github.com>
Co-authored-by: Jeffrey Wang <jeffreywang@anyscale.com>
Co-authored-by: wcy <86111164+wcynb1023@users.noreply.github.com>
Co-authored-by: Uranus <109661872+UranusSeven@users.noreply.github.com>
Co-authored-by: zzt <mf1732009@smail.nju.edu.cn>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Xunzhuo <xunzhuo@vllm-semantic-router.ai>
Co-authored-by: Alec <35311602+alec-flowers@users.noreply.github.com>
Co-authored-by: Krishna Chaitanya <krishnabkc15@gmail.com>
Co-authored-by: wjinxu <1299461899@qq.com>
Co-authored-by: Kc Balusu <kcbalusu@users.noreply.github.com>
Co-authored-by: JooHo Lee <96564470+BWAAEEEK@users.noreply.github.com>
Co-authored-by: ZiTian Zhao <zitian.zhao@tencentmusic.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: zhrrr <43847754+izhuhaoran@users.noreply.github.com>
amd-callumm added a commit to ROCm/vllm that referenced this pull request Jun 2, 2026
* [XPU] add gptq(int4) support (#37844)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* [UX] Add a persistent cache for FlashInfer autotuning (#42537)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Bugfix][MRV2] Fix KVCache tensor explicit `kernel_block_size` dim (#42766)

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Model Refactoring] Move DeepSeek V4 layers to `models/deepseek_v4/` [2/N] (#43039)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* add cutedsl dsv4 indexer fp8 kernel (#42899)

Signed-off-by: george <george@inferact.ai>
Co-authored-by: george <george@inferact.ai>

* [Bugfix][KV Connector] Fix SimpleCPUOffloadScheduler TOCTOU between Phase A and Phase B (#42289)

Signed-off-by: Qiuyang Yue <yueqiuyang1389@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist <noreply@google.com>

* [ci] Route 28 gpu_1_queue tests to h200_35gb queue (#43030)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use keyword arguments for shard_id and expert_id in weight_loade… (#42671)

Signed-off-by: junyanxu <junyanxu5513@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Docs] Add SVG images for pooling models. (#42626)

Signed-off-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

* [XPU] Use custom op collective behavior  (#41354)

Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [Misc] Aligning tokwise pooler heads for consistency (#43041)

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>

* [Docs] Reorganize online serving docs. (#41907)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [Frontend] Consolidate beam search by BeamSearchMixin. (#42946)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>

* [Model Refactoring] Move deepseek_v4_ops to models/deepseek_v4 [3/N] (#43073)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [bug] AsyncScheduler drops first post-resume token after pause_generation + clear_cache (#42117)

Signed-off-by: hao-aaron <ahao@anyscale.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [KVConnector][DSV4] HMA support for Mooncake store connector (#42828)

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

* [Model Refactoring] Rename deepseek_v4.py to model.py [4/N] (#43077)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [Misc][MM] Remove redundant code in CLIPAttention (#43046)

Signed-off-by: shen-shanshan <467638484@qq.com>

* [CI] Add MTP + PD disagg test for Qwen3.5 (#42677)

Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

* [Bugfix] Fix top logprobs token placeholders in `/inference/v1/generate` (#42887)

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>

* [Perf][4/n] Eliminate various GPU<->CPU syncs (#42347)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [XPU] update xpu graph usage (#43043)

Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>

* [Model] Openvla support (#42654)

Signed-off-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com>

* [Refactor] Extract extract_types_from_schema utility from Minimax M2 tool parser (#43025)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [Misc] add humming to dependencies (#42540)

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>

* [feat] Add FP8 per-tensor Q scale support to Triton attention backend (#42080)

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* [Docs] Fix MooncakeStoreConnector role in disaggregated example (#42994)

Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [Bugfix][MoE] FlashInfer one-sided: workspace union across heterogeneous layers (#42976)

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

* [CI failure] Temporarily disable using persistent cache for flashinfer autotune (#43119)

Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ci] Move language models tests (hybrid) back to L4 (#43129)

Signed-off-by: Kevin H. Luu <khluu000@gmail.com>

* [Model] Support post-norm architecture for EAGLE-3 supeculators (#42764)

Signed-off-by: Doğaç Eldenk <dogacel@gmail.com>

* Fix error in Dynamic NTK scaling (#41277)

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

* [CPU][DOC] Fix installation commands for Arm CPUs (#43115)

Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>

* [bug] fix WeightTransferConfig.backend to allow for all strings (#43121)

Signed-off-by: ahao-anyscale <ahao@anyscale.com>

* [MRV2][BugFix] Fix default-stream CG capture in P/W LoRA case (#43160)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Cohere] Enable Cohere MoE (#43143)

Signed-off-by: Terrencezzj <terrence@cohere.ai>

* [Perf][Bugfix] Update dflash aux layer indexing (#40727)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

* add enqueue all option to throughput benchmark (#42975)

Signed-off-by: Philip Maybank <pmaybank@amd.com>
Signed-off-by: pmaybank <113125070+pmaybank@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Perf] Avoid forward scan for async output placeholders (#42938)

* [CI] Add DSV4-Flash to gsm8k moe-refactor/config-b200.txt (#42111)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [KV Offload] Pass `OffloadingSpec` instead of `VllmConfig` to secondary tiers (#43076)

Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>

* [ci] Revert model executor test back to L4 (#43188)

Signed-off-by: Kevin H. Luu <khluu000@gmail.com>

* [Docs][PD][NIXL] Lease extension mechanism for blocks on P (#43099)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Docs][PD][NIXL] Bidirectional kv-cache transfer (#43097)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [6/n] Migrate activation kernels, gptq, gguf, non cutlass w8a8 to libtorch stable ABI (continued) (#42663)

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* Enable mermaid diagrams in the docs (#43192)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [GDN] Enable FI Blackwell GDN prefill kernel (#40717)

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

* [XPU][CI] Add 2 server model test files in Intel GPU CI (#42499)

Signed-off-by: zengxian <xiangdong.zeng@intel.com>

* [Frontend] Forward X-data-parallel-rank header on /inference/v1/generate (#42330)

Signed-off-by: hallerite <git@hallerite.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Doc] Sync CLI guide with actual help modes and launch subcommand (#40326)

Signed-off-by: Rui Wang <raygorous@gmail.com>
Co-authored-by: Rui Wang <raygorous@gmail.com>

* [Feature] Support manually enabling the cumem allocator (#33648)

Signed-off-by: Kebe <mail@kebe7jun.com>

* [Spec Decode] Support non-MTP speculation for NemotronH (#43130)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>

* Remove additional dead code as a follow-up to #42889 (#43144)

Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>

* [Bug][Structured Outputs] Fix bug that leads to unconstrained generations with structural tags (#42452)

Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* [Bugfix] Use enable_sm120_family for per-tensor FP8 CUTLASS kernels on SM12.1 (#41215)

Signed-off-by: j9smith <j.smith9103@outlook.com>
Signed-off-by: Joel Smith <j.smith9103@outlook.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [Bugfix] Use shared coerce_to_schema_type in DeepSeekV32 tool parser (#43019)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [MISC] Fix symm_mem cap-equal gate; log AR backend selection (#42993)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

* [R3] Add routed experts to openai entrypoint  (#38939)

Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [CI] Lower granite-4.0-h-tiny gsm8k threshold for Hybrid SSM NixlConnector PD accuracy tests (4 GPUs) (#43186)

Signed-off-by: haosdent <haosdent@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>

* Integrate flashinfer b12x MoE and FP4 GEMM kernels for SM120/121 (#40082)

Signed-off-by: Meenakshi Venkataraman <meenakshiv@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* [Perf] Optimize `CutlassFP8ScaledMMLinearKernel` when padding needed by pre-weight processing, 13.5% TTFT improvement (#42651)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>

* [Bugfix][CI] Add missing import of pad_nvfp4_activation_for_cutlass in flashinfer (#43237)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* Add dllehr-amd to CODEOWNERS and committers list (#42772)

Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com>

* [Perf][gpt-oss] Downgrade triton_kernels to v3.5.1 (#43135)

Signed-off-by: mgoin <mgoin64@gmail.com>

* [Misc] downgrade nvidia-cutlass-dsl to 4.5.0 (#43230)

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

* [ROCm] Add QuickReduce min-size override and codec threshold (#41675)

Signed-off-by: <>

* [CI] Add composed-schema regression tests for DeepSeek V3.2/V4 parsers (#43255)

Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>

* [Model Runner V2] Fix lora `Triton Error [CUDA]: device-side assert triggered` (#43139)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* update GPU json file based on h200 recipes (#43262)

Signed-off-by: louie-tsai <louie.tsai@intel.com>

* [Minor]  Bigger overlap for FI AR (#43103)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [Bugfix] Fix Qwen3.5 GatedDeltaNet in_proj_ba Marlin failure at TP>=2 (#36329)

Signed-off-by: Adi McM Sonus Flow <biuro@sonusflow.pl>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Perf][Gemma4] Batch vision encoder calls for image and video processing (#43169)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [CI] Fix "test_vit_cudagraph_[image|video][step3_vl]" failure (#43082)

Signed-off-by: haosdent <haosdent@gmail.com>

* [Frontend] Normalize reasoning_content to reasoning for client compatibility (#42664)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Refactor] Use shared coerce_to_schema_type in Seed-OSS tool parser (#43140)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [ToolParser][Bugfix] Re-land: Fix anyOf/oneOf/$ref type resolution in Qwen3CoderToolParser (#37831) (#38973)

Signed-off-by: AAISSJ <maze0717@g.skku.edu>
Signed-off-by: <>
Signed-off-by: sejung-son <sejung.son@nhn.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local>
Co-authored-by: sejung-son <sejung.son@nhn.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>

* [Frontend][RFC] Rust front-end integration (#40848)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>

* [Bugfix] Warn when renderer_num_workers has no effect on offline LLM (#42905)

Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>

* [Benchmark] Add num-warmup to vllm bench throughput (#43245)

Signed-off-by: Yifan Zong <yzong@redhat.com>

* [Bugfix] Fix glm4_moe_tool_parser._is_string_type for /v1/responses FunctionTool format (#39601)

Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: sfeng33 <4florafeng@gmail.com>

* [CI] De-flake test_models for bigscience/bloom-560m (#43197)

Signed-off-by: haosdent <haosdent@gmail.com>

* [XPU] add setuptools-rust for xpu dependency (#43287)

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>

* Update KDA chunk prefill decay to use exp2 semantics (#43195)

Signed-off-by: zexplorerhj <19794632+zexplorerhj@users.noreply.github.com>
Co-authored-by: zexplorerhj <19794632+zexplorerhj@users.noreply.github.com>

* Fix FlashInfer TRTLLM NvFP4 monolithic MoE routing (#43223)

Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>

* [Test] Replace zephyr-7b-beta (7B) with SmolLM2-135M in tokenization test (#43085)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Bug] Fix ci issue `assert output_size is not None` AssertionError (#43261)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>

* [CI] Pin protoc binary in rust-build stages (#43292)

Signed-off-by: haosdent <haosdent@gmail.com>

* [XPU][CI]Fix Docker image pull-to-run race in Intel GPU CI (#43266)

Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [CPU][RISC-V] Add VLEN=256 support to RVV attention kernels (#42943)

Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
Signed-off-by: velonica0 <47554626+velonica0@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>

* [Perf] [Hybrid] Fused Triton kernel for GPU-side Mamba state postprocessing (#40172)

Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [CI] Fix CPU tests failing on `tl.exp2` import (#43311)

Signed-off-by: haosdent <haosdent@gmail.com>

* [Bugfix] Add early validation to reject incompatible runner types for embedding models (#43079)

Signed-off-by: anish <anishesg@users.noreply.github.com>
Signed-off-by: Your Name <ak8686@princeton.edu>
Signed-off-by: anish <145943060+anishesg@users.noreply.github.com>
Co-authored-by: anish <anishesg@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [Deprecation] Mark env vars covered by --moe-backend / --linear-backend (#43148)

Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>

* [Perf] `zeros` -> `empty` to remove additional fill (#42988)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [Core] Add native ModelExpress load format (#43105)

Signed-off-by: Zheng Luo <zheluo@nvidia.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* Disable build isolation to bypass CUDA related deps for vllm-tpu (#43038)

Signed-off-by: Ylang Tsou <ylangt@google.com>
Co-authored-by: Ylang Tsou <ylangt@google.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>

* [Frontend] Rework fastokens integration (#43168)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Feature] Add `--cpu-distributed-timeout-seconds` CLI Option for CPU Process Group Timeout (#42968)

Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
Co-authored-by: zWaNg3 <389750525@qq.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [BugFix] Use correct logprobs for `logprob_token_ids` (#43125)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* [Bugfix] Zero stale is_prefilling in padded CUDA graph rows for Mamba (#41873)

Signed-off-by: Lanze Liu <lanzetech@gmail.com>

* [Rust Frontend] Move code from `vllm-frontend-rs` (#43283)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Eric Curtin <eric.curtin@docker.com>
Signed-off-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Signed-off-by: Will.hou <1205157517@qq.com>
Signed-off-by: Will.hou <willamhou@ceresman.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Eric Curtin <eric.curtin@docker.com>
Co-authored-by: Dev-X25874 <283057883+Dev-X25874@users.noreply.github.com>
Co-authored-by: Will.hou <1205157517@qq.com>
Co-authored-by: Will.hou <willamhou@ceresman.com>

Please see https://github.com/Inferact/vllm-frontend-rs for full original commit history.

* [CI] Fix dockerfile dependency graph failure for pre-commit (#43378)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [Bugfix] Fix DSV4 Base model swiglu limit issue in FP8 path  (#42855)

Signed-off-by: Chengze Fan <chengze@meta.com>
Signed-off-by: Chengze Fan <fancz2002@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>

* [ROCm] Add XGMI backend for MoRI Connector (#41753)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

* [ROCm][CI] add warmup to mem_util test before measurement (#43236)

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

* [Frontend] Add truncation side to OpenAI endpoints (#43260)

Signed-off-by: Rui Zhang <rza21.bc@gmail.com>
Signed-off-by: Rui Zhang <rui.zhang@globalrelay.net>
Co-authored-by: Rui Zhang <rui.zhang@globalrelay.net>

* [Frontend] DP Supervisor (#40841)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: robertgshaw2-redhat <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Bugfix] Make CuMemAllocator free callback stream-aware (#43020)

Signed-off-by: zixi-qi <zixi@inferact.ai>
Co-authored-by: Claude <noreply@anthropic.com>

* [XPU] Enable multiple key kernels for sparse attention (#37888)

Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [CI] De-flake renderers/test_hf.py::test_resolve_content_format_fallbacks[Qwen/Qwen-VL-string] (#43064)

Signed-off-by: haosdent <haosdent@gmail.com>

* [Model] Use `AutoWeightsLoader` for Voyage (#42972)

Signed-off-by: Furkan Fidan <dev@yufufi.com>

* [Model] Fix MiniCPM-V 4.6 vit_merger qkv weight loading (#43213)

Signed-off-by: tc-mb <tianchi_cai@icloud.com>

* [CI] Fix test_lora_with_spec_decode on V2 model runner (#43314)

Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [CI] Fix "test_awq_load[gemma4-moe-*]" failure (#43296)

Signed-off-by: haosdent <haosdent@gmail.com>

* Correcting the mock classes for MM GC tests (#43321)

Signed-off-by: Weida Hong <wdhongtw@google.com>

* [BugFix] Fix setuptools-rust dep in requirements files (#43377)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* Fix the docker build failure in tpu-inference (#43360)

Signed-off-by: mrjunwan-lang <mrjunwan@google.com>

* [Docs] Note image preprocessing difference between qwen_vl_utils and vllm. (#43393)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [CPU] Experimentally enable Triton and MRV2 (#43225)

Signed-off-by: jiang1.li <jiang1.li@intel.com>

* [Attention] Mamba attention module refactor (#41126)

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

* [XPU]feat: add XPU fallback for MoE topk routing and MXFP4 backend (#42951)

Signed-off-by: Ma Jian <jian1.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [Misc] Replace assert with proper exceptions for security and validation in pooling (#43286)

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>

* [Bugfix] Clear P0 mm sender cache on sleep/pause to fix mm_hash desync (#43001)

Signed-off-by: Tobias Wasner <wasnertobias@gmail.com>

* [BugFix] wire make_empty_intermediate_tensors on AyaVision and Voxtral (#43118)

Signed-off-by: Keyi Li <likey6688@gmail.com>
Co-authored-by: Keyi Li <likey6688@gmail.com>

* [LoRA] Reduce memory of 2D weights when EP is set (#42737)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [EPLB] Change default EPLB communicator (#43110)

Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>

* [CI] Fix AMD docker build tests (#43329)

Signed-off-by: haosdent <haosdent@gmail.com>

* Add NVFP4 MOE support for Deepseek V4. (#42209)

Signed-off-by: Shiyang Chen <shiychen@nvidia.com>

* [Multimodal] Simplify ViT CUDA graph interfaces (#41234)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [Rust Frontend] [Refactor] Extract a newtype for utility call ID (#43405)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Bugfix] Source num_qo_heads from Attention layers in Flashinfer/Triton metadata builders (#42650)

Signed-off-by: zhanda <zhandazhu@gmail.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>

* [KV Connector] MooncakeStore: don't co-queue save with load to avoid double delayed-free (#43371)

Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Refactor] Extract DeepSeek V4 sparse MLA impl into model folder (#43149)

* [Frontend] Simplify AuthenticationMiddleware path extraction (#43426)

Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [RFC][EPLB][#32028] Remove dead torch.accelerator.synchronize() from sync path (#40733)

Signed-off-by: SandishKumarHN <3078999+SandishKumarHN@users.noreply.github.com>
Co-authored-by: SandishKumarHN <3078999+SandishKumarHN@users.noreply.github.com>

* [Bugfix] Detect wrong libcute_dsl_runtime.so variant in FlashInfer GDN (#43427)

Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>

* [Bugfix] Clear error message for FP8 torchao quantization on unsupported GPUs (#36854)

Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* mhc_post - remove sts & add vectorized copies (#43437)

Signed-off-by: george <george@inferact.ai>
Co-authored-by: george <george@inferact.ai>

* [Quantization][ModelOpt] W4A16 NVFP4 fused MoE + mixed-precision dispatch (#42566)

Signed-off-by: Juhi Mittal <juhim@nvidia.com>

* [Model Runner V2] Support sharing kv cache layers (#35045)

Signed-off-by: Nick Hill <nickhill123@gmail.com>

* DSv4 fused Q-norm kernel grid refactor (#42353)

* [Perf] Optimize hidden state extraction logic (#37374)

Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [XPU]fix: add XPU platform guards to DeepSeek-V4 ops (#42950)

Signed-off-by: Ma Jian <jian1.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* elastic_ep: stage/commit MoE quant method on reconfigure (#40881)

Signed-off-by: Itay Alroy <ialroy@nvidia.com>

* [Attention] Add head_dim=512 support for FlashInfer trtllm attention backend (#38822)

* Add `model` to `WeightTransferEngine.__init__` (#42922)

Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [DSV4] More multi-stream enablement for c4a (#42925)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [ROCm][CI] Stabilize runner teardown between sampler tests (#43023)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [ROCm][CI] Stabilize Granite tool-use and test URL construction (#43017)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Bugfix] Auto-raise max_num_batched_tokens for prefix-LM multimodal models (#43051)

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Co-authored-by: abinggo <107740309+abinggo@users.noreply.github.com>

* [ROCm][CI] Fix ROCm LoRA Transformers fallback with full CUDA graphs (#41577)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [XPU]feat: enable FP8 block-scaled quantization on XPU (#42952)

Signed-off-by: Ma Jian <jian1.ma@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [XPU] reudce host overhead of XPU MOE (#42915)

Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

* [7/n] Migrate pos_encoding and norm kernels to libtorch stable ABI (continued) (#43209)

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [Misc] Added missing return type annotations to improve mypy and IDE tooling (#43383)

Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>

* [Bugfix] Fix native Triton top-k/top-p kernel assumes contiguous logi… (#42739)

Signed-off-by: xiaogang.zhou <xiaogang.zhou@bytedance.com>
Co-authored-by: xiaogang.zhou <xiaogang.zhou@bytedance.com>

* [ModelOpt] Support Qwen3.5/3.6 VLM quantized prefix mapping (#42546)

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

* Keep scheduler alive for delayed KV connector frees (#43433)

Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>

* fix(eagle3): read norm_before_fc from eagle_config for NVIDIA checkpoint (#42143)

Signed-off-by: FERRARIZHENG <popkart06@gmail.com>

* [Kernel] Batch invariant NVFP4 linear using cutlass (#39912)

Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>

* [ROCm][CI] Remove benchmarks test group and shard long test groups (#41669)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Bugfix][Frontend] Fix input_audio parsing when uuid is present  (#43414)

Signed-off-by: ffggs <314137448@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [MM] Enable FlashInfer metadata support for Qwen2.5-VL vision attention (#42787)

Signed-off-by: Hua Huang <huah@nvidia.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* [Docs] Fix stale version number in token_embed.md (#43488)

Signed-off-by: holegots <ikun3.1415927@gmail.com>

* [Docs] Fix stale version number in token_classify.md (#43489)

Signed-off-by: holegots <ikun3.1415927@gmail.com>

* [MoE] Migrate W4A8 CT to oracle kernel setup (#42680)

Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>

* [Mooncake] Add metrics for MooncakeStoreConnector operations (#43392)

* [ROCm][Critical] Fix the GDN import bug (#43486)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* Revert "[Misc] add humming to dependencies" (#43492)

* [Bugfix] Fix reasoning dropped on streaming boundary deltas (#42691)

Signed-off-by: sfeng33 <4florafeng@gmail.com>

* [Model Runner v2] Force v1 runner for tests (#43233)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

* [KV Connector] Keep MooncakeStore full hits block-aligned (#43494)

Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [kv_offload]: Add DSv4 support (#43142)

Signed-off-by: Or Ozeri <oro@il.ibm.com>

* [ROCm][CI] Stabilize 400 error return code for invalid schema inputs (#43016)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [ROCm] [DSv4] [Perf] Support DeepSeek v4 MTP (#43385)

Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>

* Tuning script and configs for Triton Mamba SSU kernel (#43083)

Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>

* File system secondary tier implemented in python (#41735)

Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>

* [Kernel] Add mhc_pre_big_fuse_with_norm_tilelang  (#43474)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* fix: MoE model using shared routed experts crashes on AMD GPUs (#42373)

Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>

* [Docs] Reorganize offline inference docs.  (#43552)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [Docker] Non-root support for vllm-openai; add opt-in vllm-openai-nonroot target (#40275)

Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [Feat][KVConnector] Support DSV4 in SimpleCPUOffloadBackend (#42296)

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

* [Doc] Add section on escalating stalled contributions (#43568)

Signed-off-by: esmeetu <jasonailu87@gmail.com>

* Reduce memory usage for granite_speech. (#42933)

Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [KV Connector] Handle Mooncake finish after preemption (#43281)

Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>

* [Misc] Print accuracy value for PD tests even on success  (#43583)

Signed-off-by: NickLucche <nlucches@redhat.com>

* [Kernel] Remove NormGateLinear (#43554)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [XPU] Ensure RNG offset alignment with PyTorch requirements in XPU sampler (#43028)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [LoRA] Add one shot triton kernel For MoE LoRA (#42290)

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

* [DeepSeek V4] Move MegaMoE input prep kernel to nvidia/ops (#43632)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [KV Connector][Bugfix] MooncakeStore: don't double-apply Eagle prune in load_mask (#43516)

Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [KV Connector] Propagate MooncakeStore load failures (#42788)

Signed-off-by: Dao Le <Dao007forever@gmail.com>

* [Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler (#43194)

Signed-off-by: Yan Ma <yan.ma@intel.com>

* [Frontend] Split the offline inference APIs and utils. (#43553)

Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [Bugfix][Model] Fix GPT2ForSequenceClassification sub-module prefix (#43579)

Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [GDN] GDN Prefill kernel for SM100 (#43273)

Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>

* [CPU] Enable non-divisible GQA for decode workitems in mixed batches (#43032)

Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>

* Upgrade tpu-inference to v0.20.0 (#43394)

* Add CuTe DSL sparse compressor support (#43584)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>

* [chores][log] change registry log from `warning` to `debug` (#43045)

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [Bugfix] Apply fc_norm in Eagle3DeepseekV2 combine_hidden_states (#43482)

Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>

* [KV Transfer] Enable HMA by default for connectors that support it (#41847)

Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>

* [Misc][Refactor][ROCm] Convert MoRI-related envvars to extra config args (#43303)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>

* [Misc] Support interleaved custom image benchmark datasets (#43636)

Signed-off-by: ThibaultCastells <thib.castells@icloud.com>

* [Reasoning] [Bugfix] Reject invalid thinking_token_budget values (#43402)

Signed-off-by: linzm1007 <linzm1007@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Model] Use AutoWeightsLoader for InternLM2 (#38278)

Signed-off-by: Jesus De Jesus <dejesus.9297@gmail.com>
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

* [XPU] Fix fused MoE LoRA kernel crash on XPU by using platform-agnos num_compute_units (#43646)

Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>

* Fix CuPy runtime deps and restore humming (#43530)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Docs][ROCm] MoRI-IO Connector Usage Guide (#43603)

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ROCm][CI] Extend ROCm quick reduce coverage (#40990)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Feat][DSV4] Fuse q pad into deepseek v4 fused kernel (#43162)

* [MoE Refactor] Migrate ModelOptMxFp8FusedMoE to oracle (#42768)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [MoE Refactor] W4a8 int8 oracle (#42789)

Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

* [ROCm] Remove MegaMoE integration in deepseek v4 (#43629)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Add LM head quantization support for ModelOpt (#42124)

Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>

* [Doc] Add line limit to AGENTS.md (#43635)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>

* [DSv4] Drop _get_compressed_kv_buffer in DeepseekCompressor (#43690)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* [CI] Soft-fail AMD entrypoints mirror tests (#43709)

Signed-off-by: Kevin Luu <kevin@inferact.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Kernel] Porting  fuse_minimax_qk_norm  to manual fusion (#43410)

Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>

* [KV Connector] MooncakeStore: drop dead discard_partial_chunks parameter (#43627)

Signed-off-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [Bugfix][V1] Fix TOCTOU race causing intermittent `EADDRINUSE` on multi-API-server DP startup (#42585)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [ci] Add arm64 ci image (#41303)

Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Bugfix] Split attention groups by num_heads_q for spec-decode drafts (#43543)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [Rust Frontend] Add reasoning/tool parser & renderer roundtrip tests (#43582)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [ROCm][CI] Fix ROCm multimodal Qwen2.5-VL activation compile and Phi4MM ragged image mask handling (#43647)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

* [Perf] Optimize Fp8BlockScaledMMLinearKernel input_scale tensor using new_empty() (#43677)

Signed-off-by: Xin Yang <xyangx@amazon.com>

* [Attention] Make FlexAttention and FlashAttention use num-blocks first layouts (#42095)

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

* [MLA][Attention] Add OOT MLA prefill backend registration mechanism (#43325)

Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>

* [Deprecation] Deprecate functions as scheduled for v0.21.0 (#43358)

Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [DSv4] Refactor compressor & Fix ROCm compatibility (#43710)

Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>

* Fix test_aot_compile for torch 2.12 (#43695)

Signed-off-by: Angela Yi <yiangela7@gmail.com>

* [KVConnector][Mooncake] Wire reset_cache cascade end-to-end (#42694)

Signed-off-by: aoshen524 <aoshen524@gmail.com>
Signed-off-by: Ao Shen <aoshen@inferact.ai>
Co-authored-by: aoshen524 <aoshen524@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [ROCm][Perf] Expose AITER MoE sorting dispatch policy via env var (#39177)

Signed-off-by: nholmber <nholmber@users.noreply.github.com>

* [MRV2][BugFix] Fix KV connector handling in spec decode case (#43719)

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [Frontend] Add MiniCPM5 XML tool call parser (#43175)

Signed-off-by: zhangtao <zhangtao2@modelbest.cn>
Signed-off-by: zhangtao2 <zhangtao2@modelbest.cn>
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [ROCm][GPT-OSS] Avoid repeated compile-time `cos_sin_cache.to(bf16)` casts in rotary path (#42833)

Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>

* [Doc] Add Ascend NPU tab to the quickstart installation guide (#43550)

Signed-off-by: Aditya Singh <adisin650@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* [Rust Frontend] Align tool parser fallback behavior between streaming & non-streaming paths (#43662)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

* [Docs] Fix MLA prefill backend default docs (#43697)

Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>

* [Kernel] Enable TritonW4A16LinearKernel as CUDA fallback for non-Marlin-aligned W4A16 shapes (#43731)

Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>

* [Bugfix] Map reasoning_effort to enable_thinking in chat template kwargs (#43401)

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

* [misc] Bump cutedsl version to 4.5.2 (#43745)

Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>

* [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1 (#39155)

Signed-off-by: Injae Ryou <injaeryou@gmail.com>

* [Docs] Fix the duplicate doc icon issue (#43546)

Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>

* Fix early CUDA init (#43791)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* [ROCm] mori: add InterNodeV1LL inter-node kernel selection via VLLM_MORI_INTERNODE_KERNEL (#41751)

Signed-off-by: jatseng-ai <jatseng@amd.com>

* [8/n] Migrate merge_attn_states, mamba, sampler to torch stable ABI (continued) (#43361)

Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>

* [Quantization] Fix Humming RoutedExperts import (#43540)

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

* [CI] build-rocm-wheels.yml: reduce MAX_JOBS to prevent OOM

Signed-off-by:  <callumm@amd.com>

---------

Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: george <george@inferact.ai>
Signed-off-by: Qiuyang Yue <yueqiuyang1389@gmail.com>
Signed-off-by: junyanxu <junyanxu5513@gmail.com>
Signed-off-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Chaojun,Zhang <chaojun.zhang@intel.com>
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: ZhanqiuHu <zhu@redhat.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Signed-off-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: Doğaç Eldenk <dogacel@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: Terrencezzj <terrence@cohere.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Philip Maybank <pmaybank@amd.com>
Signed-off-by: pmaybank <113125070+pmaybank@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Signed-off-by: Chris Leonard <chleonar@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Signed-off-by: zengxian <xiangdong.zeng@intel.com>
Signed-off-by: hallerite <git@hallerite.com>
Signed-off-by: Rui Wang <raygorous@gmail.com>
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
Signed-off-by: j9smith <j.smith9103@outlook.com>
Signed-off-by: Joel Smith <j.smith9103@outlook.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: haosdent <haosdent@gmail.com>
Signed-off-by: Meenakshi Venkataraman <meenakshiv@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: <>
Signed-off-by: Ace Eldeib <aeldeib@coreweave.com>
Signed-off-by: louie-tsai <louie.tsai@intel.com>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: Adi McM Sonus Flow <biuro@sonusflow.pl>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: AAISSJ <maze0717@g.skku.edu>
Signed-off-by: sejung-son <sejung.son@nhn.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
Signed-off-by: Yifan Zong <yzong@redhat.com>
Signed-off-by: Yiyang Liu <37043548+ianliuy@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: zexplorerhj <19794632+zexplorerhj@users.noreply.github.com>
Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>
Signed-off-by: Isotr0py <Isotr0py@outlook.com>
Signed-off-by: velonica0 <like@mail.nankai.edu.cn>
Signed-off-by: velonica0 <47554626+velonica0@users.noreply.github.com>
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
Signed-off-by: anish <anishesg@users.noreply.github.com>
Signed-off-by: Your Name <ak8686@princeton.edu>
Signed-off-by: anish <145943060+anishesg@users.noreply.github.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Zheng Luo <zheluo@nvidia.com>
Signed-off-by: Ylang Tsou <ylangt@google.com>
Signed-off-by: fangyuchu <fangyuchu@qq.com>
Signed-off-by: zWaNg3 <389750525@qq.com>
Signed-off-by: Lanze Liu <lanzetech@gmail.com>
Signed-off-by: Chengze Fan <chengze@meta.com>
Signed-off-by: Chengze Fan <fancz2002@gmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: Rui Zhang <rza21.bc@gmail.com>
Signed-off-by: Rui Zhang <rui.zhang@globalrelay.net>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Signed-off-by: zixi-qi <zixi@inferact.ai>
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Furkan Fidan <dev@yufufi.com>
Signed-off-by: tc-mb <tianchi_cai@icloud.com>
Signed-off-by: Weida Hong <wdhongtw@google.com>
Signed-off-by: mrjunwan-lang <mrjunwan@google.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Ma Jian <jian1.ma@intel.com>
Signed-off-by: Tobias Wasner <wasnertobias@gmail.com>
Signed-off-by: Keyi Li <likey6688@gmail.com>
Signed-off-by: Markov Ilya <markovilya19@gmail.com>
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
Signed-off-by: zhanda <zhandazhu@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: SandishKumarHN <3078999+SandishKumarHN@users.noreply.github.com>
Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: xiaogang.zhou <xiaogang.zhou@bytedance.com>
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
Signed-off-by: FERRARIZHENG <popkart06@gmail.com>
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: ffggs <314137448@qq.com>
Signed-off-by: Hua Huang <huah@nvidia.com>
Signed-off-by: holegots <ikun3.1415927@gmail.com>
Signed-off-by: Siddharth Bedekar <bedeksid@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: Dao Le <daole@inferact.ai>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Banani Ghosh <bg2502@nyu.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Rotem Shavitt <rshavitt@gmail.com>
Signed-off-by: weizhou.lan@daocloud.io <weizhou.lan@daocloud.io>
Signed-off-by: TheDuyIT <nduy250299@gmail.com>
Signed-off-by: dtnguyen <dtnguyen@nvidia.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Yihuki <wangbovbvb@gmail.com>
Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Signed-off-by: Chaojun Zhang <chaojun.zhang@intel.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Yan Ma <yan.ma@intel.com>
Signed-off-by: QingZhou-YangHY <3868850350@qq.com>
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
Signed-off-by: zhejiangxiaomai <zhenhui.zhao@intel.com>
Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
Signed-off-by: Ethan Feng <ethan.fengch@gmail.com>
Signed-off-by: ThibaultCastells <thib.castells@icloud.com>
Signed-off-by: linzm1007 <linzm1007@126.com>
Signed-off-by: Jesus De Jesus <dejesus.9297@gmail.com>
Signed-off-by: javierdejesusda <javier.dejesusj9@gmail.com>
Signed-off-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Kevin Luu <kevin@inferact.ai>
Signed-off-by: Zhewen Li <zhewen@inferact.ai>
Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Signed-off-by: khluu <khluu000@gmail.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Angela Yi <yiangela7@gmail.com>
Signed-off-by: aoshen524 <aoshen524@gmail.com>
Signed-off-by: Ao Shen <aoshen@inferact.ai>
Signed-off-by: nholmber <nholmber@users.noreply.github.com>
Signed-off-by: zhangtao <zhangtao2@modelbest.cn>
Signed-off-by: zhangtao2 <zhangtao2@modelbest.cn>
Signed-off-by: Aakif Nawaz <aakif.nawaz@amd.com>
Signed-off-by: Aditya Singh <adisin650@gmail.com>
Signed-off-by: Injae Ryou <injaeryou@gmail.com>
Signed-off-by: chunyang.wen <chunyang.wen@gmail.com>
Signed-off-by: jatseng-ai <jatseng@amd.com>
Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>
Signed-off-by: <callumm@amd.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: gnovack <gnovack@amazon.com>
Co-authored-by: george <george@inferact.ai>
Co-authored-by: Qiuyang Yue <yueqiuyang1389@gmail.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist <noreply@google.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Junyan Xu <junyanxu5513@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Gracie Guo (UX) <114208705+gracie-guo@users.noreply.github.com>
Co-authored-by: Gracie Guo <gracieguo@Gracies-MacBook-Pro.local>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Chaojun Zhang <chaojun.zhang@intel.com>
Co-authored-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Aaron Hao <ahao@anyscale.com>
Co-authored-by: Yifan Qiao <yifanqiao@inferact.ai>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: zhanqiuhu <49648934+ZhanqiuHu@users.noreply.github.com>
Co-authored-by: Sage <80211083+sagearc@users.noreply.github.com>
Co-authored-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Wang Yiwen <121547057+yiwen101@users.noreply.github.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Dao007forever <dao007forever@gmail.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
Co-authored-by: Doğaç Eldenk <dogacel@gmail.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
Co-authored-by: Terrence Zhao <32208165+Terrencezzj@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: pmaybank <113125070+pmaybank@users.noreply.github.com>
Co-authored-by: Izik Golan <47969623+izikgo@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Ronen Schaffer <ronen.schaffer@ibm.com>
Co-authored-by: Chris Leonard <chleonar@redhat.com>
Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Artem Perevedentsev <aperevedents@nvidia.com>
Co-authored-by: xiangdong <40376367+zxd1997066@users.noreply.github.com>
Co-authored-by: hallerite <git@hallerite.com>
Co-authored-by: Ray Wang <roguerui6@gmail.com>
Co-authored-by: Rui Wang <raygorous@gmail.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: rishitdholakia13 <123388671+rishitdholakia13@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Joel Smith <j.smith9103@outlook.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: haosdent <haosdent@gmail.com>
Co-authored-by: meena-at-work <80416898+meena-at-work@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: akii96 <aakif.nawaz@amd.com>
Co-authored-by: Ace Eldeib <alexeldeib@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: sonusflow <git@sonusflow.pl>
Co-authored-by: Luciano Martins <22145370+lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: 손세정 <maze0717@g.skku.edu>
Co-authored-by: 세덩 <saison@sedeong-ui-MacBookAir.local>
Co-authored-by: sejung-son <sejung.son@nhn.com>
Co-authored-by: Bugen Zhao <i@bugenzhao.com>
Co-authored-by: Daoyuan Li <94409450+DaoyuanLi2816@users.noreply.github.com>
Co-authored-by: yzong-rh <yzong@redhat.com>
Co-authored-by: Yiyang "Ian" Liu <yiyangliu@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: zexplorerhj <zhjoneson@163.com>
Co-authored-by: zexplorerhj <19794632+zexplorerhj@users.noreply.github.com>
Co-authored-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>
Co-authored-by: Isotr0py <Isotr0py@outlook.com>
Co-authored-by: velonica0 <47554626+velonica0@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: Francesco Fusco <ffu@zurich.ibm.com>
Co-authored-by: anish <145943060+anishesg@users.noreply.github.com>
Co-authored-by: anish <anishesg@users.noreply.github.com>
Co-authored-by: Zheng Luo <zheluo@nvidia.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: ylangtsou <149562838+ylangtsou@users.noreply.github.com>
Co-authored-by: Ylang Tsou <ylangt@google.com>
Co-authored-by: fangyuchu <fangyuchu@qq.com>
Co-authored-by: zWaNg3 <389750525@qq.com>
Co-authored-by: Lanze Liu <86434077+liulanze@users.noreply.github.com>
Co-authored-by: Chengze Fan <fancz2002@gmail.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Simon Danielsson <70206058+simondanielsson@users.noreply.github.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: ruizhang <rza21.bc@gmail.com>
Co-authored-by: Rui Zhang <rui.zhang@globalrelay.net>
Co-authored-by: robertgshaw2-redhat <robertgshaw2@gmail.com>
Co-authored-by: qizixi <22851944+zixi-qi@users.noreply.github.com>
Co-authored-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Furkan F <id+git@yufufi.com>
Co-authored-by: tc-mb <157115220+tc-mb@users.noreply.github.com>
Co-authored-by: Weida Hong <wdhongtw@google.com>
Co-authored-by: mrjunwan-lang <mrjunwan@google.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: Ma Jian <jian1.ma@intel.com>
Co-authored-by: Tobias Wasner <wasnertobias@users.noreply.github.com>
Co-authored-by: Keyi Li <94494390+JasonKeyiL@users.noreply.github.com>
Co-authored-by: Keyi Li <likey6688@gmail.com>
Co-authored-by: Ilya Markov <markovilya197@gmail.com>
Co-authored-by: Markov Ilya <markovilya19@gmail.com>
Co-authored-by: sychen52 <41452870+sychen52@users.noreply.github.com>
Co-authored-by: Zhanda Zhu <49645678+zhandaz@users.noreply.github.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>
Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: SandishKumarHN <sandishkumarhn@gmail.com>
Co-authored-by: SandishKumarHN <3078999+SandishKumarHN@users.noreply.github.com>
Co-authored-by: Juhi Mittal <39641197+juhi10071998@users.noreply.github.com>
Co-authored-by: Itay Alroy <75032521+itayalroy@users.noreply.github.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Sumanth R Hegde <39546518+SumanthRH@users.noreply.github.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Ashwin Giridharan <ashwing@users.noreply.github.com>
Co-authored-by: abinggo <107740309+abinggo@users.noreply.github.com>
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>
Co-authored-by: Xiaogang Zhou <zhou16386@163.com>
Co-authored-by: xiaogang.zhou <xiaogang.zhou@bytedance.com>
Co-authored-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>
Co-authored-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com>
Co-authored-by: GuangYaoZheng <popkart06@gmail.com>
Co-authored-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Co-authored-by: ffggs <314137448@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Hua Huang <huangh1994@outlook.com>
Co-authored-by: Holegots <fuergaosi@gmail.com>
Co-authored-by: Siddharth Bedekar <104613085+bedeks@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Banani Ghosh <bg2502@nyu.edu>
Co-authored-by: Rotem Shavitt <rshavitt@gmail.com>
Co-authored-by: weizhoublue <45163302+weizhoublue@users.noreply.github.com>
Co-authored-by: Nguyễn Thế Duy <dtnguyen@nvidia.com>
Co-authored-by: Roy Wang <jasonailu87@gmail.com>
Co-authored-by: Yihuki <wangbovbvb@gmail.com>
Co-authored-by: Zhewen Li <zhewenli@meta.com>
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
Co-authored-by: Yan Ma <yan.ma@intel.com>
Co-authored-by: Huanyu Yang <20242081160@mail.dlut.edu.cn>
Co-authored-by: Thien Tran <gau.nernst@yahoo.com.sg>
Co-authored-by: zhao, zhenhui <zhenhui.zhao@intel.com>
Co-authored-by: Sting Lin <sting.lin@cienet.com>
Co-authored-by: Jie Fang <jief@nvidia.com>
Co-authored-by: Hank_ <37239608+ILikeIneine@users.noreply.github.com>
Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
Co-authored-by: Ethan Feng <ethan.fengch@gmail.com>
Co-authored-by: Thibault Castells <38716394+ThibaultCastells@users.noreply.github.com>
Co-authored-by: linzm1007 <96732179+linzm1007@users.noreply.github.com>
Co-authored-by: Javier De Jesus <javier.dejesusj9@gmail.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Zhewen Li <zhewen@inferact.ai>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Angela Yi <yiangela7@gmail.com>
Co-authored-by: aoshen02 <aoshen@inferact.ai>
Co-authored-by: aoshen524 <aoshen524@gmail.com>
Co-authored-by: Nico Holmberg <nico.holmberg@amd.com>
Co-authored-by: zhangtao2-1 <478679312@qq.com>
Co-authored-by: zhangtao <zhangtao2@modelbest.cn>
Co-authored-by: Aditya Singh <60082699+adityasingh2400@users.noreply.github.com>
Co-authored-by: Injae Ryou <injaeryou@gmail.com>
Co-authored-by: Chunyang Wen <chunyang.wen@gmail.com>
Co-authored-by: jatseng-ai <jatseng@amd.com>
Co-authored-by: Minh Vu <vuhoangminh97@gmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…oject#43392)

Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants