Skip to content

[Bugfix] Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention#32508

Open
yt0428 wants to merge 2 commits intovllm-project:mainfrom
yt0428:Add_MTP_for_openpangu_and_bugfix
Open

[Bugfix] Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention#32508
yt0428 wants to merge 2 commits intovllm-project:mainfrom
yt0428:Add_MTP_for_openpangu_and_bugfix

Conversation

@yt0428
Copy link
Contributor

@yt0428 yt0428 commented Jan 17, 2026

Purpose

This PR further add MTP for opanpangu_pro_moe model #28775 , and it also fix an initialization bug in StaticSinkAttention

For MTP support, the major modification is that we need to do a shallow copy for spec_decode_common_attn_metadata in gpu_model_runner.py. The block_table_tensor of common_metadata may be modified during the building of StaticSinkAttention, this will, in turn, affect spec_decode_common_attn_metadata, as it is a direct reference to common_metadata. A simple shallow copy can avoid this.

As for the initialization bug in StaticSinkAttention, we move the init of CustomOp to the beginning of the initialization to avoid the refresh of member variables.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…ticSinkAttention

Signed-off-by: yuantao <2422264527@qq.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the opanpangu_pro_moe model and addresses an initialization bug in StaticSinkAttention. The changes include adding the new model type to speculative configuration and model architecture convertors, as well as refining the weight loading process for the openpangu_mtp model. A critical fix involves ensuring a shallow copy of attention metadata to prevent unintended side effects during speculative decoding initialization.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

@yt0428
Copy link
Contributor Author

yt0428 commented Feb 10, 2026

@DarkLight1337 @WoosukKwon @youkaichao @robertgshaw2-redhat @mgoin @tlrmchlsmth @houseroad @hmellor @yewentao256 @ProExpertProg
Hello, could you please give some reviews about this small PR? Many Thanks!!!

@yt0428 yt0428 changed the title Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention [Bugfix] Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention Feb 12, 2026
@mergify
Copy link

mergify bot commented Feb 12, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yt0428.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Feb 12, 2026
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
@mergify
Copy link

mergify bot commented Mar 3, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @yt0428.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 3, 2026
spec_decode_common_attn_metadata = copy(cm)
else:
spec_decode_common_attn_metadata = cm
spec_decode_common_attn_metadata = copy(cm)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this copy into StaticSinkAttentionBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the builder is not responsible for the building of spec_decode_common_attn_metadata, it is handled by gpu_model_runner outside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs-rebase v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants