Skip to content

Merge upstream/main into matthias.awq_gemv#18

Merged
mgehre-amd merged 437 commits intomatthias.awq_gemvfrom
matthias.merge-upstream
Mar 24, 2026
Merged

Merge upstream/main into matthias.awq_gemv#18
mgehre-amd merged 437 commits intomatthias.awq_gemvfrom
matthias.merge-upstream

Conversation

@mgehre-amd
Copy link
Owner

Summary

  • Merge upstream vllm-project/vllm main into matthias.awq_gemv
  • Brings in ~333 upstream commits
  • Conflicts resolved in CMakeLists.txt, kernels/linear/init.py, fused_moe/runner/default_moe_runner.py, layers/utils.py, models/qwen2_5_vl.py

Test plan

  • Benchmarked Qwen3-4B spec decode (20 prompts, --enforce-eager): 7.57ms median TPOT (no regression vs pre-merge 7.76ms)
  • Benchmarked Qwen3-4B spec decode (20 prompts, cudagraph + compile_sizes=[3,6,12]): 7.56ms median TPOT (no regression vs pre-merge 7.79ms)

juliendenize and others added 30 commits March 14, 2026 07:26
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: root <root@h200-bar-196-227.slurm-bar-compute.tenant-slurm.svc.cluster.local>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…mats (vllm-project#35109)

Signed-off-by: seanmamasde <seanmamasde@gmail.com>
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
…ject#32384)

Signed-off-by: Karan Bansal <karanb192@gmail.com>
Co-authored-by: Inokinoki <inoki@inoki.cc>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Lalithnarayan C <Lalithnarayan.C@amd.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Chinmay-Kulkarni-AMD <Chinmay.Kulkarni@amd.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…iio_connector to restore P/D functionality (vllm-project#34907)

Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: yitingw1 <yiting.wang@intel.com>
…xtral test (vllm-project#37138)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…subclasses in schema fuzz tests (vllm-project#37127)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…akiness (vllm-project#36442)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…odel loading (vllm-project#37136)

Signed-off-by: esmeetu <jasonailu87@gmail.com>
…de_stack guards instead of previous hacks (vllm-project#36204)

Signed-off-by: Laith Sakka <lsakka@meta.com>
…deo inputs (vllm-project#37147)

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…well (vllm-project#36987)

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
fxdawnn and others added 20 commits March 19, 2026 19:55
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
…dates (vllm-project#37523)

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>
Signed-off-by: Yuxiang Liang <yuliang@habana.ai>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ct#37364)

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…project#37293)

Signed-off-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Wangbei25 <wangbei41@huawei.com>
Co-authored-by: Wangbei25 <wangbei41@huawie.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
…llm-project#37461)

Signed-off-by: root <root@prenyx0169.a51.clusters.nvidia.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: <>
Co-authored-by: root <root@prenyx0169.a51.clusters.nvidia.com>
Co-authored-by: root <root@prenyx0042.a51.clusters.nvidia.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…lay (vllm-project#37639)

Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
… attention backend (vllm-project#37611)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Brings in 333 upstream commits. Conflicts resolved in:
- CMakeLists.txt (arch ordering)
- kernels/linear/__init__.py (both new kernels kept)
- fused_moe/runner/default_moe_runner.py (kept our init + upstream refactor)
- layers/utils.py (kept our tracing + upstream aiter support)
- models/qwen2_5_vl.py (kept our attn backend check + upstream model_tag removal)

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
- Remove stale ensure_dp_chunking_init() call (renamed to
  _maybe_init_dp_chunking and moved to __init__ upstream)
- Update use_fi_all2allv_kernels -> use_fi_nvl_two_sided_kernels
  in hip_w4a16_experts.py and exllama_moe.py to match upstream rename

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Copy link
Collaborator

@eble-amd eble-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the files that you listed as having conflicts, but I didn't notice changes to any code that I am familiar with, so LGTM.

@mgehre-amd mgehre-amd removed the request for review from roberteg16 March 24, 2026 13:06
@mgehre-amd mgehre-amd marked this pull request as draft March 24, 2026 13:14
@mgehre-amd mgehre-amd marked this pull request as ready for review March 24, 2026 13:14
@mgehre-amd mgehre-amd merged commit 813a753 into matthias.awq_gemv Mar 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.