[Model][2/N] Remove deepseek_mtp modeling. by whx-sjtu · Pull Request #3561 · vllm-project/vllm-ascend

whx-sjtu · 2025-10-20T12:11:03Z

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp.

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

github-actions · 2025-10-20T12:11:12Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request is a good refactoring step that removes the custom deepseek_mtp implementation in favor of the upstream vllm version. However, the adaptation to the new API is incomplete in vllm_ascend/spec_decode/mtp_proposer.py. Several method calls are missing the required sampling_metadata argument, which will cause runtime errors. I have provided critical comments with suggestions to fix these issues.

gemini-code-assist · 2025-10-20T12:13:12Z

-                            previous_hidden_states=self.
-                            hidden_states[:num_input_tokens],
-                            kv_caches=self.runner.kv_caches[-1:])
+                            hidden_states=self.hidden_states[:num_input_tokens])


The call to self.model.forward() is missing the required sampling_metadata argument. The _propose function has access to sampling_metadata, which should be passed to the forward call.

Suggested change

hidden_states=self.hidden_states[:num_input_tokens])

hidden_states=self.hidden_states[:num_input_tokens],

sampling_metadata=sampling_metadata)

gemini-code-assist · 2025-10-20T12:13:12Z


            sample_hidden_states = hidden_states[last_token_indices]
-            logits = self.model.compute_logits(sample_hidden_states, None)
+            logits = self.model.compute_logits(sample_hidden_states)


The call to self.model.compute_logits() is missing the required sampling_metadata argument. The _propose function has access to sampling_metadata, which should be passed to the compute_logits call.

Suggested change

logits = self.model.compute_logits(sample_hidden_states)

logits = self.model.compute_logits(sample_hidden_states, sampling_metadata)

Parameter sampling_metadata only exists in CustomDeepSeekMTP. Now we directly use DeepSeekMTP of vLLM which doesn't need this param.

whx-sjtu · 2025-10-20T12:32:49Z

The adaption in quant_config can be removed after PR(vllm-project/vllm#27193) merged.

whx-sjtu · 2025-10-21T08:37:12Z

+        # Currently mlapo only supports W8A8 quantization in MLA scenario
+        # TODO(whx): modify this limitation when mlapo supports floating point
+        if self.fused_qkv_a_proj is None or not isinstance(
+                getattr(self.fused_qkv_a_proj.quant_method, 'quant_method',


I will modify this to enable mlapo in both unquantized and W8A8-static scenarios when mlapo supports floating point. I think we still need this check to disable mlapo in other unsupported situations such as W8A8-dynamic or W4A8 etc. cc @zzzzwwjj

Signed-off-by: whx-sjtu <2952154980@qq.com>

This is a missing bug fix introduced by PR #3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>

…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>

…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>

…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>

…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>

…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>

github-actions Bot added module:tests module:quantization labels Oct 20, 2025

gemini-code-assist Bot reviewed Oct 20, 2025

View reviewed changes

whx-sjtu force-pushed the rm_ds_mtp branch 3 times, most recently from 5fac93d to dc2648f Compare October 21, 2025 07:38

wangxiyuan approved these changes Oct 21, 2025

View reviewed changes

zzzzwwjj reviewed Oct 21, 2025

View reviewed changes

Comment thread vllm_ascend/attention/mla_v1.py

whx-sjtu commented Oct 21, 2025

View reviewed changes

whx-sjtu added 2 commits October 21, 2025 17:33

remove deepseek_mtp

4999082

Signed-off-by: whx-sjtu <2952154980@qq.com>

fix bug of mtp with mlapo

a5d9f6b

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu force-pushed the rm_ds_mtp branch from d64882d to 3e121c0 Compare October 21, 2025 09:33

add logging and notes

3e121c0

Signed-off-by: whx-sjtu <2952154980@qq.com>

whx-sjtu added ready read for review ready-for-test start test by label for PR labels Oct 21, 2025

wangxiyuan merged commit 220df60 into vllm-project:main Oct 21, 2025
40 of 42 checks passed

whx-sjtu mentioned this pull request Oct 21, 2025

[BugFix] Fix torchair+mtp bug after deleting deepseek_mtp. #3590

Merged

wangxiyuan mentioned this pull request Jan 26, 2026

[Community] Nominate whx-sjtu as maintainer #6268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model][2/N] Remove deepseek_mtp modeling.#3561

[Model][2/N] Remove deepseek_mtp modeling.#3561
wangxiyuan merged 3 commits intovllm-project:mainfrom
whx-sjtu:rm_ds_mtp

whx-sjtu commented Oct 20, 2025 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Oct 20, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Oct 20, 2025

Uh oh!

gemini-code-assist Bot Oct 20, 2025

Uh oh!

whx-sjtu Oct 20, 2025

Uh oh!

whx-sjtu commented Oct 20, 2025

Uh oh!

Uh oh!

whx-sjtu Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	hidden_states=self.hidden_states[:num_input_tokens])
	hidden_states=self.hidden_states[:num_input_tokens],
	sampling_metadata=sampling_metadata)

	logits = self.model.compute_logits(sample_hidden_states)
	logits = self.model.compute_logits(sample_hidden_states, sampling_metadata)

Conversation

whx-sjtu commented Oct 20, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Oct 20, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu commented Oct 20, 2025

Uh oh!

Uh oh!

whx-sjtu Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

whx-sjtu commented Oct 20, 2025 •

edited by github-actions Bot

Loading