Skip to content

[Model] Add shared_head to prefix of SharedHead.#27193

Closed
whx-sjtu wants to merge 1 commit intovllm-project:mainfrom
whx-sjtu:shared_head_prefix
Closed

[Model] Add shared_head to prefix of SharedHead.#27193
whx-sjtu wants to merge 1 commit intovllm-project:mainfrom
whx-sjtu:shared_head_prefix

Conversation

@whx-sjtu
Copy link
Copy Markdown
Contributor

@whx-sjtu whx-sjtu commented Oct 20, 2025

Purpose

Add shared_head to prefix of SharedHead which is used in quantization scenarios to determine quant type of certain layer.

Test Plan

No extra test needed.

Test Result

All ci should pass.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: whx-sjtu <2952154980@qq.com>
@whx-sjtu whx-sjtu requested a review from luccafong as a code owner October 20, 2025 12:31
@mergify mergify Bot added the deepseek Related to DeepSeek models label Oct 20, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adds the shared_head prefix when instantiating SharedHead in both deepseek_mtp.py and glm4_moe_mtp.py. This change is necessary for quantization scenarios to correctly determine the quantization type for the layer. The implementation is straightforward and correct. However, I've identified a code duplication issue with the SharedHead class itself, which is defined identically in both modified files. I've left a comment with a suggestion to refactor this for better maintainability.

@@ -75,7 +75,9 @@ def __init__(self, vllm_config: VllmConfig, prefix: str) -> None:
topk_indices_buffer = None

self.shared_head = SharedHead(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While the change to add a prefix is correct, I've noticed that the SharedHead class is duplicated. An identical implementation exists in vllm/model_executor/models/glm4_moe_mtp.py. To adhere to the DRY (Don't Repeat Yourself) principle and improve long-term maintainability, this class should be defined in a single, shared location, such as vllm/model_executor/models/utils.py, and then imported where needed. This would prevent future inconsistencies and simplify changes like the one in this PR, as it would only need to be applied once.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions Bot added the stale Over 90 days of inactivity label Jan 19, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

@github-actions github-actions Bot closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models stale Over 90 days of inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant