Skip to content

[Bugfix] rename enable_flash_comm_v1 back to enable_sp#6883

Merged
zzzzwwjj merged 1 commit intovllm-project:mainfrom
realliujiaxu:refactor/rename-enable-sp-back
Mar 1, 2026
Merged

[Bugfix] rename enable_flash_comm_v1 back to enable_sp#6883
zzzzwwjj merged 1 commit intovllm-project:mainfrom
realliujiaxu:refactor/rename-enable-sp-back

Conversation

@realliujiaxu
Copy link
Copy Markdown
Collaborator

@realliujiaxu realliujiaxu commented Feb 28, 2026

What this PR does / why we need it?

PR #5632 introduced a bug by replacing some branches gated by enable_sp with enable_flash_comm_v1. As a result, when enable_shared_expert_dp is enabled alone (i.e., VLLM_ASCEND_ENABLE_FLASHCOMM1=0 and VLLM_ASCEND_ENABLE_FLASHCOMM=0), the behavior becomes inconsistent with the previous logic and leads to accuracy issues. This PR restores the original enable_sp-based branching to recover expected behavior and accuracy.

Does this PR introduce any user-facing change?

No

How was this patch tested?

1. start server

vllm serve /home/weights/DeepSeek-V2-Lite-W8A8/  \
    --port 8001 \
    --served-model-name auto \
    --max-model-len 1024 \
    --enforce-eager \
    --tensor-parallel-size 2 \
    --data-parallel-size 2 \
    --gpu-memory-utilization 0.9 \
    --enable-expert-parallel \
    --additional-config '{"enable_shared_expert_dp": true}'

2. curl

curl -s http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Hello. I have a question. Who are you?"}
  ],
  "max_tokens": 10,
  "temperature": 0.0,
  "ignore_eos_token": true
}'

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving code consistency by renaming a key helper function related to sequence parallelism. The change standardizes the function's name across the codebase to match its configuration counterpart, without altering its functional behavior. This refactoring simplifies future maintenance and enhances readability by using a more descriptive and consistent naming scheme.

Highlights

  • Function Renaming: The helper function enable_flash_comm_v1 has been renamed to enable_sp to align with the pass_config.enable_sp naming convention.
  • Call Site Updates: All instances where enable_flash_comm_v1 was called have been updated to use the new enable_sp function.
  • Behavior Preservation: The underlying logic and behavior of the function remain unchanged, ensuring backward compatibility with existing environment variables.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • vllm_ascend/ascend_forward_context.py
    • Updated import statement to use enable_sp instead of enable_flash_comm_v1.
    • Replaced calls to enable_flash_comm_v1() with enable_sp(vllm_config).
  • vllm_ascend/ops/linear.py
    • Updated import statement to use enable_sp instead of enable_flash_comm_v1.
    • Replaced calls to enable_flash_comm_v1() with enable_sp().
  • vllm_ascend/ops/linear_op.py
    • Updated import statement to use enable_sp instead of enable_flash_comm_v1.
    • Replaced calls to enable_flash_comm_v1() with enable_sp() in various parallel operation checks.
  • vllm_ascend/platform.py
    • Removed the import of enable_flash_comm_v1.
    • Replaced calls to enable_flash_comm_v1() with enable_sp().
  • vllm_ascend/utils.py
    • Removed the deprecated enable_flash_comm_v1 function.
    • Integrated the environment variable checking logic from enable_flash_comm_v1 directly into the enable_sp function.
    • Updated enable_dsa_cp to call enable_sp().
  • vllm_ascend/worker/model_runner_v1.py
    • Removed the import of enable_flash_comm_v1.
    • Added import for enable_sp_by_pass.
    • Modified _pad_for_sequence_parallelism to check both enable_sp and enable_sp_by_pass.
    • Replaced calls to enable_flash_comm_v1() with enable_sp().
  • vllm_ascend/worker/worker.py
    • Updated import statement to use enable_sp instead of enable_flash_comm_v1.
    • Replaced calls to enable_flash_comm_v1() with enable_sp() in execute_model.
Activity
  • No activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@realliujiaxu realliujiaxu force-pushed the refactor/rename-enable-sp-back branch from 48e3efb to dd2d9f9 Compare February 28, 2026 16:48
@realliujiaxu realliujiaxu changed the title refactor(sp): rename enable_flash_comm_v1 back to enable_sp [Bugfix] rename enable_flash_comm_v1 back to enable_sp Feb 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the sequence parallelism helper function by renaming enable_flash_comm_v1 to enable_sp and updating all its call sites. The behavior is intended to be unchanged. My review confirms that the logic is preserved. However, the refactoring has left an unused vllm_config parameter in the new enable_sp function, which I've commented on. Addressing this would improve code clarity.

Comment on lines +733 to +738
_ENABLE_SP = (
envs_ascend.VLLM_ASCEND_ENABLE_FLASHCOMM1
# Flash comm 1 should be enabled by env VLLM_ASCEND_ENABLE_FLASHCOMM1
# We retain the env VLLM_ASCEND_ENABLE_FLASHCOMM here for backward compatibility.
or bool(int(os.getenv("VLLM_ASCEND_ENABLE_FLASHCOMM", "0")))
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

After this refactoring, the vllm_config parameter in the enable_sp function signature is no longer used within the function body. This makes the code less clear. Consider removing the vllm_config parameter and the associated logic that handles it when it's None. This would require updating all call sites to no longer pass this argument.

Restore the SP helper naming to enable_sp across runtime call sites to keep naming consistent with pass_config.enable_sp and existing SP semantics.

Made-with: Cursor
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
@realliujiaxu realliujiaxu force-pushed the refactor/rename-enable-sp-back branch from dd2d9f9 to 11a3627 Compare March 1, 2026 01:15
@realliujiaxu realliujiaxu added ready read for review ready-for-test start test by label for PR labels Mar 1, 2026
@zzzzwwjj zzzzwwjj merged commit 5e24b26 into vllm-project:main Mar 1, 2026
78 of 83 checks passed
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
…6883)

### What this PR does / why we need it?

PR vllm-project#5632 introduced a bug by replacing some branches gated by enable_sp
with enable_flash_comm_v1. As a result, when enable_shared_expert_dp is
enabled alone (i.e., VLLM_ASCEND_ENABLE_FLASHCOMM1=0 and
VLLM_ASCEND_ENABLE_FLASHCOMM=0), the behavior becomes inconsistent with
the previous logic and leads to accuracy issues. This PR restores the
original enable_sp-based branching to recover expected behavior and
accuracy.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

#### 1. start server
``` bash
vllm serve /home/weights/DeepSeek-V2-Lite-W8A8/  \
    --port 8001 \
    --served-model-name auto \
    --max-model-len 1024 \
    --enforce-eager \
    --tensor-parallel-size 2 \
    --data-parallel-size 2 \
    --gpu-memory-utilization 0.9 \
    --enable-expert-parallel \
    --additional-config '{"enable_shared_expert_dp": true}'
```

#### 2. curl
```bash
curl -s http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Hello. I have a question. Who are you?"}
  ],
  "max_tokens": 10,
  "temperature": 0.0,
  "ignore_eos_token": true
}'
```

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
…6883)

### What this PR does / why we need it?

PR vllm-project#5632 introduced a bug by replacing some branches gated by enable_sp
with enable_flash_comm_v1. As a result, when enable_shared_expert_dp is
enabled alone (i.e., VLLM_ASCEND_ENABLE_FLASHCOMM1=0 and
VLLM_ASCEND_ENABLE_FLASHCOMM=0), the behavior becomes inconsistent with
the previous logic and leads to accuracy issues. This PR restores the
original enable_sp-based branching to recover expected behavior and
accuracy.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

#### 1. start server
``` bash
vllm serve /home/weights/DeepSeek-V2-Lite-W8A8/  \
    --port 8001 \
    --served-model-name auto \
    --max-model-len 1024 \
    --enforce-eager \
    --tensor-parallel-size 2 \
    --data-parallel-size 2 \
    --gpu-memory-utilization 0.9 \
    --enable-expert-parallel \
    --additional-config '{"enable_shared_expert_dp": true}'
```

#### 2. curl
```bash
curl -s http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Hello. I have a question. Who are you?"}
  ],
  "max_tokens": 10,
  "temperature": 0.0,
  "ignore_eos_token": true
}'
```

- vLLM version: v0.16.0
- vLLM main:
vllm-project/vllm@15d76f7

Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:core module:ops ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants