Skip to content

[Model][2/N] Remove deepseek_mtp modeling.#3561

Merged
wangxiyuan merged 3 commits intovllm-project:mainfrom
whx-sjtu:rm_ds_mtp
Oct 21, 2025
Merged

[Model][2/N] Remove deepseek_mtp modeling.#3561
wangxiyuan merged 3 commits intovllm-project:mainfrom
whx-sjtu:rm_ds_mtp

Conversation

@whx-sjtu
Copy link
Copy Markdown
Collaborator

@whx-sjtu whx-sjtu commented Oct 20, 2025

This PR is step 2 of deepseek model refactoring and removes deepseek_mtp.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a good refactoring step that removes the custom deepseek_mtp implementation in favor of the upstream vllm version. However, the adaptation to the new API is incomplete in vllm_ascend/spec_decode/mtp_proposer.py. Several method calls are missing the required sampling_metadata argument, which will cause runtime errors. I have provided critical comments with suggestions to fix these issues.

Comment thread vllm_ascend/spec_decode/mtp_proposer.py
Comment thread vllm_ascend/spec_decode/mtp_proposer.py Outdated
previous_hidden_states=self.
hidden_states[:num_input_tokens],
kv_caches=self.runner.kv_caches[-1:])
hidden_states=self.hidden_states[:num_input_tokens])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The call to self.model.forward() is missing the required sampling_metadata argument. The _propose function has access to sampling_metadata, which should be passed to the forward call.

Suggested change
hidden_states=self.hidden_states[:num_input_tokens])
hidden_states=self.hidden_states[:num_input_tokens],
sampling_metadata=sampling_metadata)


sample_hidden_states = hidden_states[last_token_indices]
logits = self.model.compute_logits(sample_hidden_states, None)
logits = self.model.compute_logits(sample_hidden_states)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The call to self.model.compute_logits() is missing the required sampling_metadata argument. The _propose function has access to sampling_metadata, which should be passed to the compute_logits call.

Suggested change
logits = self.model.compute_logits(sample_hidden_states)
logits = self.model.compute_logits(sample_hidden_states, sampling_metadata)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter sampling_metadata only exists in CustomDeepSeekMTP. Now we directly use DeepSeekMTP of vLLM which doesn't need this param.

@whx-sjtu
Copy link
Copy Markdown
Collaborator Author

The adaption in quant_config can be removed after PR(vllm-project/vllm#27193) merged.

@whx-sjtu whx-sjtu force-pushed the rm_ds_mtp branch 3 times, most recently from 5fac93d to dc2648f Compare October 21, 2025 07:38
Comment thread vllm_ascend/attention/mla_v1.py
# Currently mlapo only supports W8A8 quantization in MLA scenario
# TODO(whx): modify this limitation when mlapo supports floating point
if self.fused_qkv_a_proj is None or not isinstance(
getattr(self.fused_qkv_a_proj.quant_method, 'quant_method',
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify this to enable mlapo in both unquantized and W8A8-static scenarios when mlapo supports floating point. I think we still need this check to disable mlapo in other unsupported situations such as W8A8-dynamic or W4A8 etc. cc @zzzzwwjj

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
@whx-sjtu whx-sjtu added ready read for review ready-for-test start test by label for PR labels Oct 21, 2025
@wangxiyuan wangxiyuan merged commit 220df60 into vllm-project:main Oct 21, 2025
40 of 42 checks passed
wangxiyuan pushed a commit that referenced this pull request Oct 21, 2025
This is a missing bug fix introduced by PR #3561

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
ZYang6263 pushed a commit to rjg-lyh/vllm-ascend that referenced this pull request Oct 23, 2025
This PR is step 2 of deepseek model refactoring and removes
deepseek_mtp.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
ZYang6263 pushed a commit to rjg-lyh/vllm-ascend that referenced this pull request Oct 23, 2025
…ect#3590)

This is a missing bug fix introduced by PR vllm-project#3561

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
This PR is step 2 of deepseek model refactoring and removes
deepseek_mtp.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
…ect#3590)

This is a missing bug fix introduced by PR vllm-project#3561

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
This PR is step 2 of deepseek model refactoring and removes
deepseek_mtp.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…ect#3590)

This is a missing bug fix introduced by PR vllm-project#3561

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
This PR is step 2 of deepseek model refactoring and removes
deepseek_mtp.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…ect#3590)

This is a missing bug fix introduced by PR vllm-project#3561

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
This PR is step 2 of deepseek model refactoring and removes
deepseek_mtp.

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 9, 2025
…ect#3590)

This is a missing bug fix introduced by PR vllm-project#3561

- vLLM version: v0.11.0rc3
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

---------

Signed-off-by: whx-sjtu <2952154980@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:quantization module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants