[Bugfix] Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention#32508
[Bugfix] Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention#32508yt0428 wants to merge 2 commits intovllm-project:mainfrom
Conversation
…ticSinkAttention Signed-off-by: yuantao <2422264527@qq.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for the opanpangu_pro_moe model and addresses an initialization bug in StaticSinkAttention. The changes include adding the new model type to speculative configuration and model architecture convertors, as well as refining the weight loading process for the openpangu_mtp model. A critical fix involves ensuring a shallow copy of attention metadata to prevent unintended side effects during speculative decoding initialization.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
|
@DarkLight1337 @WoosukKwon @youkaichao @robertgshaw2-redhat @mgoin @tlrmchlsmth @houseroad @hmellor @yewentao256 @ProExpertProg |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
|
This pull request has merge conflicts that must be resolved before it can be |
| spec_decode_common_attn_metadata = copy(cm) | ||
| else: | ||
| spec_decode_common_attn_metadata = cm | ||
| spec_decode_common_attn_metadata = copy(cm) |
There was a problem hiding this comment.
can we move this copy into StaticSinkAttentionBuilder?
There was a problem hiding this comment.
Unfortunately the builder is not responsible for the building of spec_decode_common_attn_metadata, it is handled by gpu_model_runner outside.
Purpose
This PR further add MTP for opanpangu_pro_moe model #28775 , and it also fix an initialization bug in StaticSinkAttention
For MTP support, the major modification is that we need to do a shallow copy for
spec_decode_common_attn_metadataingpu_model_runner.py. The block_table_tensor ofcommon_metadatamay be modified during the building of StaticSinkAttention, this will, in turn, affectspec_decode_common_attn_metadata, as it is a direct reference tocommon_metadata. A simple shallow copy can avoid this.As for the initialization bug in
StaticSinkAttention, we move the init ofCustomOpto the beginning of the initialization to avoid the refresh of member variables.Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.