[Bugfix] rename enable_flash_comm_v1 back to enable_sp by realliujiaxu · Pull Request #6883 · vllm-project/vllm-ascend

realliujiaxu · 2026-02-28T16:47:38Z

What this PR does / why we need it?

PR #5632 introduced a bug by replacing some branches gated by enable_sp with enable_flash_comm_v1. As a result, when enable_shared_expert_dp is enabled alone (i.e., VLLM_ASCEND_ENABLE_FLASHCOMM1=0 and VLLM_ASCEND_ENABLE_FLASHCOMM=0), the behavior becomes inconsistent with the previous logic and leads to accuracy issues. This PR restores the original enable_sp-based branching to recover expected behavior and accuracy.

Does this PR introduce any user-facing change?

No

How was this patch tested?

1. start server

vllm serve /home/weights/DeepSeek-V2-Lite-W8A8/  \
    --port 8001 \
    --served-model-name auto \
    --max-model-len 1024 \
    --enforce-eager \
    --tensor-parallel-size 2 \
    --data-parallel-size 2 \
    --gpu-memory-utilization 0.9 \
    --enable-expert-parallel \
    --additional-config '{"enable_shared_expert_dp": true}'

2. curl

curl -s http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Hello. I have a question. Who are you?"}
  ],
  "max_tokens": 10,
  "temperature": 0.0,
  "ignore_eos_token": true
}'

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@15d76f7

github-actions · 2026-02-28T16:47:52Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist · 2026-02-28T16:48:00Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving code consistency by renaming a key helper function related to sequence parallelism. The change standardizes the function's name across the codebase to match its configuration counterpart, without altering its functional behavior. This refactoring simplifies future maintenance and enhances readability by using a more descriptive and consistent naming scheme.

Highlights

Function Renaming: The helper function enable_flash_comm_v1 has been renamed to enable_sp to align with the pass_config.enable_sp naming convention.
Call Site Updates: All instances where enable_flash_comm_v1 was called have been updated to use the new enable_sp function.
Behavior Preservation: The underlying logic and behavior of the function remain unchanged, ensuring backward compatibility with existing environment variables.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

vllm_ascend/ascend_forward_context.py
- Updated import statement to use enable_sp instead of enable_flash_comm_v1.
- Replaced calls to enable_flash_comm_v1() with enable_sp(vllm_config).
vllm_ascend/ops/linear.py
- Updated import statement to use enable_sp instead of enable_flash_comm_v1.
- Replaced calls to enable_flash_comm_v1() with enable_sp().
vllm_ascend/ops/linear_op.py
- Updated import statement to use enable_sp instead of enable_flash_comm_v1.
- Replaced calls to enable_flash_comm_v1() with enable_sp() in various parallel operation checks.
vllm_ascend/platform.py
- Removed the import of enable_flash_comm_v1.
- Replaced calls to enable_flash_comm_v1() with enable_sp().
vllm_ascend/utils.py
- Removed the deprecated enable_flash_comm_v1 function.
- Integrated the environment variable checking logic from enable_flash_comm_v1 directly into the enable_sp function.
- Updated enable_dsa_cp to call enable_sp().
vllm_ascend/worker/model_runner_v1.py
- Removed the import of enable_flash_comm_v1.
- Added import for enable_sp_by_pass.
- Modified _pad_for_sequence_parallelism to check both enable_sp and enable_sp_by_pass.
- Replaced calls to enable_flash_comm_v1() with enable_sp().
vllm_ascend/worker/worker.py
- Updated import statement to use enable_sp instead of enable_flash_comm_v1.
- Replaced calls to enable_flash_comm_v1() with enable_sp() in execute_model.

Activity

No activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the sequence parallelism helper function by renaming enable_flash_comm_v1 to enable_sp and updating all its call sites. The behavior is intended to be unchanged. My review confirms that the logic is preserved. However, the refactoring has left an unused vllm_config parameter in the new enable_sp function, which I've commented on. Addressing this would improve code clarity.

gemini-code-assist · 2026-02-28T16:50:05Z

vllm_ascend/utils.py

+        _ENABLE_SP = (
+            envs_ascend.VLLM_ASCEND_ENABLE_FLASHCOMM1
+            # Flash comm 1 should be enabled by env VLLM_ASCEND_ENABLE_FLASHCOMM1
+            # We retain the env VLLM_ASCEND_ENABLE_FLASHCOMM here for backward compatibility.
+            or bool(int(os.getenv("VLLM_ASCEND_ENABLE_FLASHCOMM", "0")))
+        )


After this refactoring, the vllm_config parameter in the enable_sp function signature is no longer used within the function body. This makes the code less clear. Consider removing the vllm_config parameter and the associated logic that handles it when it's None. This would require updating all call sites to no longer pass this argument.

Restore the SP helper naming to enable_sp across runtime call sites to keep naming consistent with pass_config.enable_sp and existing SP semantics. Made-with: Cursor Signed-off-by: realliujiaxu <realliujiaxu@163.com>

…6883) ### What this PR does / why we need it? PR vllm-project#5632 introduced a bug by replacing some branches gated by enable_sp with enable_flash_comm_v1. As a result, when enable_shared_expert_dp is enabled alone (i.e., VLLM_ASCEND_ENABLE_FLASHCOMM1=0 and VLLM_ASCEND_ENABLE_FLASHCOMM=0), the behavior becomes inconsistent with the previous logic and leads to accuracy issues. This PR restores the original enable_sp-based branching to recover expected behavior and accuracy. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? #### 1. start server ``` bash vllm serve /home/weights/DeepSeek-V2-Lite-W8A8/ \ --port 8001 \ --served-model-name auto \ --max-model-len 1024 \ --enforce-eager \ --tensor-parallel-size 2 \ --data-parallel-size 2 \ --gpu-memory-utilization 0.9 \ --enable-expert-parallel \ --additional-config '{"enable_shared_expert_dp": true}' ``` #### 2. curl ```bash curl -s http://localhost:8001/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [ {"role": "user", "content": "Hello. I have a question. Who are you?"} ], "max_tokens": 10, "temperature": 0.0, "ignore_eos_token": true }' ``` - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@15d76f7 Signed-off-by: realliujiaxu <realliujiaxu@163.com>

realliujiaxu requested review from MengqingCao, wangxiyuan, whx-sjtu and zzzzwwjj as code owners February 28, 2026 16:47

github-actions bot added the module:ops label Feb 28, 2026

github-actions bot added the module:core label Feb 28, 2026

realliujiaxu force-pushed the refactor/rename-enable-sp-back branch from 48e3efb to dd2d9f9 Compare February 28, 2026 16:48

realliujiaxu changed the title ~~refactor(sp): rename enable_flash_comm_v1 back to enable_sp~~ [Bugfix] rename enable_flash_comm_v1 back to enable_sp Feb 28, 2026

gemini-code-assist bot reviewed Feb 28, 2026

View reviewed changes

refactor(sp): rename enable_flash_comm_v1 back to enable_sp

11a3627

Restore the SP helper naming to enable_sp across runtime call sites to keep naming consistent with pass_config.enable_sp and existing SP semantics. Made-with: Cursor Signed-off-by: realliujiaxu <realliujiaxu@163.com>

realliujiaxu force-pushed the refactor/rename-enable-sp-back branch from dd2d9f9 to 11a3627 Compare March 1, 2026 01:15

realliujiaxu added ready read for review ready-for-test start test by label for PR labels Mar 1, 2026

zzzzwwjj approved these changes Mar 1, 2026

View reviewed changes

zzzzwwjj merged commit 5e24b26 into vllm-project:main Mar 1, 2026
78 of 83 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] rename enable_flash_comm_v1 back to enable_sp#6883

[Bugfix] rename enable_flash_comm_v1 back to enable_sp#6883
zzzzwwjj merged 1 commit intovllm-project:mainfrom
realliujiaxu:refactor/rename-enable-sp-back

realliujiaxu commented Feb 28, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

gemini-code-assist bot commented Feb 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

realliujiaxu commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

1. start server

2. curl

Uh oh!

github-actions bot commented Feb 28, 2026

Uh oh!

gemini-code-assist bot commented Feb 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

realliujiaxu commented Feb 28, 2026 •

edited

Loading