[Model][2/N] Remove deepseek_mtp modeling.#3561
[Model][2/N] Remove deepseek_mtp modeling.#3561wangxiyuan merged 3 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request is a good refactoring step that removes the custom deepseek_mtp implementation in favor of the upstream vllm version. However, the adaptation to the new API is incomplete in vllm_ascend/spec_decode/mtp_proposer.py. Several method calls are missing the required sampling_metadata argument, which will cause runtime errors. I have provided critical comments with suggestions to fix these issues.
| previous_hidden_states=self. | ||
| hidden_states[:num_input_tokens], | ||
| kv_caches=self.runner.kv_caches[-1:]) | ||
| hidden_states=self.hidden_states[:num_input_tokens]) |
There was a problem hiding this comment.
The call to self.model.forward() is missing the required sampling_metadata argument. The _propose function has access to sampling_metadata, which should be passed to the forward call.
| hidden_states=self.hidden_states[:num_input_tokens]) | |
| hidden_states=self.hidden_states[:num_input_tokens], | |
| sampling_metadata=sampling_metadata) |
|
|
||
| sample_hidden_states = hidden_states[last_token_indices] | ||
| logits = self.model.compute_logits(sample_hidden_states, None) | ||
| logits = self.model.compute_logits(sample_hidden_states) |
There was a problem hiding this comment.
The call to self.model.compute_logits() is missing the required sampling_metadata argument. The _propose function has access to sampling_metadata, which should be passed to the compute_logits call.
| logits = self.model.compute_logits(sample_hidden_states) | |
| logits = self.model.compute_logits(sample_hidden_states, sampling_metadata) |
There was a problem hiding this comment.
Parameter sampling_metadata only exists in CustomDeepSeekMTP. Now we directly use DeepSeekMTP of vLLM which doesn't need this param.
|
The adaption in quant_config can be removed after PR(vllm-project/vllm#27193) merged. |
5fac93d to
dc2648f
Compare
| # Currently mlapo only supports W8A8 quantization in MLA scenario | ||
| # TODO(whx): modify this limitation when mlapo supports floating point | ||
| if self.fused_qkv_a_proj is None or not isinstance( | ||
| getattr(self.fused_qkv_a_proj.quant_method, 'quant_method', |
There was a problem hiding this comment.
I will modify this to enable mlapo in both unquantized and W8A8-static scenarios when mlapo supports floating point. I think we still need this check to disable mlapo in other unsupported situations such as W8A8-dynamic or W4A8 etc. cc @zzzzwwjj
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
This is a missing bug fix introduced by PR #3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: luolun <luolun1995@cmbchina.com>
This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>
…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: hwhaokun <haokun0405@163.com>
This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>
…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>
This PR is step 2 of deepseek model refactoring and removes deepseek_mtp. - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
…ect#3590) This is a missing bug fix introduced by PR vllm-project#3561 - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
This PR is step 2 of deepseek model refactoring and removes deepseek_mtp.