[Feat] Flashcomm2 use o_shared linear by zzhx1 · Pull Request #4188 · vllm-project/vllm-ascend

zzhx1 · 2025-11-13T19:20:22Z

What this PR does / why we need it?

It is mentioned in the flashcomm2 technical report that FC2 will introduce full redundant storage of the o_proj matrix, which will put pressure on the memory. Therefore, the technical report proposed a compromise solution using otp2, but it will introduce additional reduce-scatter communication.

We propose a shared linear feature (#2931 ) that supports distributing weights layer by layer to each card, avoiding the need for TP splitting, and can solve the memory issue.

This PR depends on #3232 and #2931

Flashcomm2 flowchart

Does this PR introduce any user-facing change?

Use environment variables

export VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1
export VLLM_ASCEND_ENABLE_FLASHCOMM2_OSHARED=1

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

github-actions · 2025-11-13T19:20:30Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces support for shared o_proj linear layers for Flashcomm2, which involves changes across configuration, distributed state management, and the attention mechanism. The core logic for shared weights is implemented in vllm_ascend/torchair/ops/shared_weight_layer.py, which has been refactored for better usability.

My review focuses on ensuring the correctness and robustness of the new feature. I've identified a few critical issues:

Incorrect validation logic for the new flashcomm2_oproj_shared configuration that could lead to silent failures.
A potential crash in the shared weight layer logic when handling a series with a single layer.

I have provided suggestions to fix these issues. The rest of the changes look good and the refactoring of the shared weight layer API is a nice improvement.

gemini-code-assist · 2025-11-13T19:22:51Z

+            if self.flashcomm2_oproj_tensor_parallel_size is None:
+                raise AssertionError(
+                    "flashcomm2_oproj_shared must be enabled simultaneously with flashcomm2_oproj_tensor_parallel_size"
+                )


The validation if self.flashcomm2_oproj_tensor_parallel_size is None: is incorrect. The value of self.flashcomm2_oproj_tensor_parallel_size is an integer returned from get_flashcomm2_config_and_validate (which gets it from an environment variable with a default of 0), so it will never be None. The check should be against 0, as flashcomm2_oproj_shared requires flashcomm2_oproj_tensor_parallel_size to be greater than 0.

Suggested change

if self.flashcomm2_oproj_tensor_parallel_size is None:

raise AssertionError(

"flashcomm2_oproj_shared must be enabled simultaneously with flashcomm2_oproj_tensor_parallel_size"

)

if self.flashcomm2_oproj_tensor_parallel_size == 0:

raise AssertionError(

"flashcomm2_oproj_shared must be enabled with flashcomm2_oproj_tensor_parallel_size > 0"

)

gemini-code-assist · 2025-11-13T19:22:51Z

+        self.layers.sort(key=lambda x: x.layer_idx)
+        self.num_layers = len(self.layers)
+        assert self.num_layers > 0, "No layers in the series"
+        assert self.prefetch_step >= 0 and self.prefetch_step <= self.num_layers - 2, "prefetch_step must be in [0, num_layers - 2]"


The assertion self.prefetch_step <= self.num_layers - 2 will cause a crash if a shared weight series contains only one layer (self.num_layers == 1), because self.num_layers - 2 would be -1. For a single-layer series, prefetching is not applicable, and prefetch_step should be 0. To prevent this crash, the assertion should be adjusted to handle this edge case.

Suggested change

assert self.prefetch_step >= 0 and self.prefetch_step <= self.num_layers - 2, "prefetch_step must be in [0, num_layers - 2]"

assert self.prefetch_step >= 0 and self.prefetch_step <= max(0, self.num_layers - 2), "prefetch_step must be in [0, num_layers - 2]"

gemini-code-assist · 2025-11-13T19:22:52Z

+    if flashcomm2_oproj_shared:
+        if flashcomm2_oproj_tp_size is None:
+            raise AssertionError(
+                "flashcomm2_oproj_shared must be enabled simultaneously with flashcomm2_oproj_tensor_parallel_size"
+            )
+        logger.info("Enable Flashcomm2 with flashcomm2_oproj_shared")


This validation logic for flashcomm2_oproj_shared is redundant with the logic in vllm_ascend/ascend_config.py. It's better to have validation in one place to avoid inconsistencies. Since ascend_config.py is the configuration entry point, it's a better place for this check. Additionally, the check if flashcomm2_oproj_tp_size is None: is incorrect, as flashcomm2_oproj_tp_size is an integer. I've suggested a fix in ascend_config.py and recommend removing this redundant block.

github-actions · 2025-12-01T11:06:30Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

zzhx1 · 2025-12-07T08:43:10Z

@wangxiyuan this PR is ready, please help merge it in.

Signed-off-by: zzhxx <2783294813@qq.com>

wangxiyuan

please update the doc as well https://docs.vllm.ai/projects/ascend/en/latest/tutorials/DeepSeek-V3.1.html

wangxiyuan · 2025-12-10T09:50:05Z

    # between this feature and FLASHCOMM1, please refer to the feature guide in the documentation.
    "VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE":
    lambda: int(os.getenv("VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE", 0)),
+    "VLLM_ASCEND_ENABLE_FLASHCOMM2_OSHARED":


Add the note to describe how to use this env

Signed-off-by: zzhxx <2783294813@qq.com>

…domain in sfa-cp, and fix the mtp weight load in pp>1 situation (#4913) ### What this PR does / why we need it? In PR #4188, a small bug was introduced that caused sfa-cp to be unable to find the global_pp_size parameter during initialization, and this PR fixed the issue. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>

dsxsteven · 2025-12-16T15:33:59Z

+        group_ranks = []
+        for pp_idx in range(global_pp_size):
+            group = []
+            for dp_idx in range(global_dp_size):


How can adapt this to PCP?

…domain in sfa-cp, and fix the mtp weight load in pp>1 situation (vllm-project#4913) ### What this PR does / why we need it? In PR vllm-project#4188, a small bug was introduced that caused sfa-cp to be unable to find the global_pp_size parameter during initialization, and this PR fixed the issue. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>

…domain in sfa-cp, and fix the mtp weight load in pp>1 situation (vllm-project#4913) ### What this PR does / why we need it? In PR vllm-project#4188, a small bug was introduced that caused sfa-cp to be unable to find the global_pp_size parameter during initialization, and this PR fixed the issue. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

github-actions Bot added the module:core label Nov 13, 2025

gemini-code-assist Bot reviewed Nov 13, 2025

View reviewed changes

zzhx1 force-pushed the flashcomm_oshared branch from a951ad1 to da0d630 Compare November 14, 2025 05:05

zzhx1 changed the title ~~Flashcomm2 use o_shared linear~~ [Feat] Flashcomm2 use o_shared linear Nov 14, 2025

github-actions Bot added the module:tests label Nov 14, 2025

zzhx1 force-pushed the flashcomm_oshared branch 3 times, most recently from 47501e5 to 8bf8ed5 Compare November 14, 2025 09:14

github-actions Bot added module:tests and removed module:tests labels Nov 14, 2025

zzhx1 force-pushed the flashcomm_oshared branch 2 times, most recently from faaa68e to bfbda42 Compare November 17, 2025 05:08

zzhx1 force-pushed the flashcomm_oshared branch from 60f1aac to 89c3923 Compare November 24, 2025 07:12

whx-sjtu reviewed Nov 25, 2025

View reviewed changes

Comment thread vllm_ascend/distributed/parallel_state.py Outdated

github-actions Bot added the module:ops label Nov 26, 2025

zzhx1 force-pushed the flashcomm_oshared branch 3 times, most recently from 18803f9 to a9fae57 Compare December 1, 2025 08:28

github-actions Bot added the merge-conflicts label Dec 1, 2025

zzhx1 force-pushed the flashcomm_oshared branch from a9fae57 to 6ca00ba Compare December 1, 2025 12:03

github-actions Bot removed the merge-conflicts label Dec 1, 2025

zzhx1 force-pushed the flashcomm_oshared branch from 26672ae to ce92a65 Compare December 1, 2025 14:10

Levi-JQ reviewed Dec 1, 2025

View reviewed changes

Comment thread vllm_ascend/distributed/parallel_state.py Outdated

clrs97 reviewed Dec 2, 2025

View reviewed changes

Comment thread vllm_ascend/attention/mla_v1.py

zzhx1 force-pushed the flashcomm_oshared branch 4 times, most recently from fd7c9fa to ba1a760 Compare December 4, 2025 07:40

replace bool

8da4e8b

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

zzhx1 force-pushed the flashcomm_oshared branch 2 times, most recently from 2427a23 to 0835a03 Compare December 6, 2025 17:39

fix bug

3a1bf01

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

zzhx1 force-pushed the flashcomm_oshared branch from 0835a03 to 3a1bf01 Compare December 6, 2025 18:04

jianzs approved these changes Dec 9, 2025

View reviewed changes

zzhx1 and others added 8 commits December 9, 2025 15:08

Merge branch 'main' into flashcomm_oshared

1cf4699

Merge branch 'main' into flashcomm_oshared

def82ce

Recreate the communication group

265c025

Signed-off-by: zzhxx <2783294813@qq.com>

Merge branch 'main' into flashcomm_oshared

1534c34

Merge branch 'main' into flashcomm_oshared

68d1c37

fix code

b659d2a

Signed-off-by: zzhxx <2783294813@qq.com>

Merge branch 'main' into flashcomm_oshared

c08fadf

Merge branch 'main' into flashcomm_oshared

65e439a

wangxiyuan reviewed Dec 10, 2025

View reviewed changes

wangxiyuan approved these changes Dec 10, 2025

View reviewed changes

zzhx1 added 4 commits December 10, 2025 20:42

Merge branch 'main' into flashcomm_oshared

3f263f8

add note about environment variable switch

ad6edda

Signed-off-by: zzhxx <2783294813@qq.com>

Merge branch 'main' into flashcomm_oshared

437905c

Merge branch 'main' into flashcomm_oshared

44b6ace

ApsarasX merged commit eac72f5 into vllm-project:main Dec 11, 2025
16 of 18 checks passed

zzhx1 mentioned this pull request Dec 11, 2025

[Bugfix] Fix the bug in initializing the shared_weight communication domain in sfa-cp, and fix the mtp weight load in pp>1 situation #4913

Merged

dsxsteven reviewed Dec 16, 2025

View reviewed changes

dsxsteven mentioned this pull request Dec 17, 2025

[Feat] Adapt FlashComm2 with PCP #5114

Closed

Yikun mentioned this pull request Feb 5, 2026

[v0.13.0rc2] FAQ / Feedback | 问题/反馈 #6186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Flashcomm2 use o_shared linear#4188

[Feat] Flashcomm2 use o_shared linear#4188
ApsarasX merged 15 commits intovllm-project:mainfrom
zzhx1:flashcomm_oshared

zzhx1 commented Nov 13, 2025 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Nov 13, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 13, 2025

Uh oh!

gemini-code-assist Bot Nov 13, 2025

Uh oh!

gemini-code-assist Bot Nov 13, 2025

Uh oh!

Uh oh!

github-actions Bot commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

zzhx1 commented Dec 7, 2025

Uh oh!

wangxiyuan left a comment

Uh oh!

wangxiyuan Dec 10, 2025

Uh oh!

Uh oh!

dsxsteven Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

	assert self.prefetch_step >= 0 and self.prefetch_step <= self.num_layers - 2, "prefetch_step must be in [0, num_layers - 2]"
	assert self.prefetch_step >= 0 and self.prefetch_step <= max(0, self.num_layers - 2), "prefetch_step must be in [0, num_layers - 2]"

Conversation

zzhx1 commented Nov 13, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Flashcomm2 flowchart

Does this PR introduce any user-facing change?

Uh oh!

github-actions Bot commented Nov 13, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Dec 1, 2025

Uh oh!

Uh oh!

Uh oh!

zzhx1 commented Dec 7, 2025

Uh oh!

wangxiyuan left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsxsteven Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

zzhx1 commented Nov 13, 2025 •

edited by github-actions Bot

Loading