Merge upstream and validate#4
Merged
Merged
Conversation
…ject#42038) Signed-off-by: dqzhengAP <dqzheng1996@gmail.com>
…vllm-project#42037) Signed-off-by: haosdent <haosdent@gmail.com>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
… dist-info registration for XPU/ROCm (vllm-project#42040) Signed-off-by: dqzhengAP <dqzheng1996@gmail.com> Signed-off-by: David Zheng <153074367+dzhengAP@users.noreply.github.com> Signed-off-by: Andreas Karatzas <akaratza@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Andreas Karatzas <akaratza@amd.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
…parameter (vllm-project#42061) Signed-off-by: Markus Hartikainen <markus.hartikainen@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: Terrencezzj <terrence@cohere.ai> Co-authored-by: Cursor <cursoragent@cursor.com>
…ol parser (vllm-project#42026) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
Signed-off-by: Richard Barnes <rbarnes@meta.com> Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
… operand layout with WGMMA (vllm-project#42076) Signed-off-by: kermit <ckeming@outlook.com>
…ng CUDA graph capture failure (vllm-project#42070) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: ZhanqiuHu <zhu@redhat.com>
…r issue (vllm-project#40708) Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: SoluMilken <ypiheyn.imm02g@g2.nctu.edu.tw>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
…ivations) support (vllm-project#41769) Signed-off-by: Juhi Mittal <juhim@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…atures without `KVCacheConfig` (vllm-project#39832) The v0.12.0 release contained initial support for HMA in KV Connectors. As part of these changes, a KVCacheConfig argument was added to KV connector constructors. Backwards compatibility support for out-of-tree connectors was included in this change, with a very prominent warning. See vllm-project#25712 and vllm-project#27887. Since the warning has been around for over 5 months, we can safely remove the support of it. Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: roG0d <baonudesifeizhai@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…llm-project#41846) Signed-off-by: Nave Assaf <nassaf@nvidia.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
vllm-project#33322) Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com> Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
…llm-project#42176) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
…0951) Signed-off-by: Christian Van <cvan20191@gmail.com> Co-authored-by: Christian Van <cvan20191@gmail.com>
…ject#39306) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Signed-off-by: Itay Etelis <etelis2019@gmail.com> Signed-off-by: Itay Etelis <92247226+Etelis@users.noreply.github.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Itay Etelis <etelis2019@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…roject#41573) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…2272) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…usable-buffer loaders like runai distributed streaming (vllm-project#42244) Signed-off-by: Noa Neria <nneria@nvidia.com>
…ect#41928) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
…t#40392) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
…dingManager` (vllm-project#41727) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…modelopt_quant_algo` (vllm-project#42181) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
…42081) Signed-off-by: yewentao256 <zhyanwentao@126.com>
…mprovement (vllm-project#40408) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
…ject#41812) Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Vinay R Damodaran <vrdn@hey.com> Co-authored-by: Russell Bryant <russell.bryant@gmail.com>
…lm-project#41987) Signed-off-by: alexagriffith <agriffith96@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
…llm-project#42201) Signed-off-by: vensen <vensenmu@gmail.com> Signed-off-by: Vensen <vensenmu@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ject#41825) Signed-off-by: Frida Andersson <fanderss@amd.com> Signed-off-by: Chuan Li <chuali@amd.com> Co-authored-by: Markus Hartikainen <markus.hartikainen@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Chuan Li <chuali@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Frida Andersson <frida-andersson@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Jee Jee Li <jeejeelee@inferact.ai> Co-authored-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…ect#41761) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…oject#36517) Signed-off-by: Patrick Schlangen <pschlan@amd.com>
…uired tool_choice (vllm-project#42292) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…42217) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>
Resolved two conflicts in vllm/model_executor/layers/deepseek_v4_attention.py:
* Decode path: dropped the
``VLLM_ROCM_USE_V4_TRITON_FALLBACK``-gated ``rocm_forward_decode_fallback``
branch — upstream unified the call to ``flash_mla_with_kvcache`` for
both CUDA and ROCm. The ROCm path is already routed to our
``flash_mla_with_kvcache_rocm`` Triton kernel via
``vllm.v1.attention.ops.flashmla`` (which already accepts the new
``is_fp8_kvcache``/``extra_k_cache``/``extra_indices_in_kvcache``
kwargs).
* Prefill path: dropped the env-gated branch around
``flash_mla_sparse_fwd`` and adopted upstream's signature (no longer
returns a 3-tuple). Our ``flash_mla_sparse_fwd_rocm`` writes via
``out=`` so the return value is harmless to ignore.
Post-merge cleanup:
* vllm/platforms/rocm.py: removed our duplicate "deepseek_v4_fp8"
entry — upstream now adds it as the first member of
``supported_quantization``.
* vllm/envs.py: trimmed the ``VLLM_ROCM_USE_V4_TRITON_FALLBACK``
docstring from four call sites down to two (SWA K-cache writer and
sparse indexer). The MLA decode / sparse-prefill paths are now
permanently routed through the ROCm Triton fallbacks via flashmla.py
on ROCm — no env-var toggle needed there any more.
Kept (still required after the merge):
* vllm/model_executor/layers/sparse_attn_indexer.py — dispatch to
``rocm_sparse_attn_indexer_no_insert`` when
skip_k_cache_insert + AITER disabled + env-var on.
* vllm/v1/attention/ops/rocm_sparse_attn_indexer.py (recovered
pre-rebase orchestration).
* vllm/v1/attention/ops/rocm_flash_mla_sparse.py +
flashmla.py ROCm dispatch.
* vllm/model_executor/models/deepseek_v4.py:
``_resolve_deepseek_v4_expert_dtype`` — still required because
upstream's new cached property only honours an explicit
``hf_config.expert_dtype`` and otherwise defaults to ``"fp4"``,
misrouting FP8 checkpoints that ship without the field.
* The Python SWA K-cache writer reference + env-gate around the
HIPified ``fused_deepseek_v4_qnorm_rope_kv_rope_quant_insert``
C++ kernel (still buggy on MI300X / FNUZ).
Backup tag: pre-upstream-merge-0512.
Co-authored-by: Cursor <cursoragent@cursor.com>
Upstream-added ``mhc_fused_post_pre`` calls three tilelang kernels (``mhc_fused_tilelang``, ``mhc_post_tilelang``, ``mhc_pre_big_fuse_tilelang``) that all use Program Dependent Launch (PDL — Hopper-only). On ROCm tilelang's ``MarkCudaSyncCalls`` raises ``PDL is not supported`` at JIT-compile time, taking down every TP worker during profile_run: [TileLang:...]: TileLang begins to compile kernel `mhc_post_tilelang` tvm.error.InternalError: Check failed: ... PDL is not supported The non-fused ``mhc_pre`` and ``mhc_post`` already carry torch ROCm fallbacks; this commit composes them to back the fused op on ROCm, matching the contract (4-tuple of residual_cur / post_mix_cur / comb_mix_cur / layer_input_cur with the exact same shapes and dtypes as the tilelang path). The CUDA path is untouched. This unblocks DSv4-Flash-Base-FP8 profile_run on MI300X after the upstream merge that wired the fused op into the layer forward path. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.