[Bugfix] Fix in_profile_run in mtp_proposer dummy_run#5165
[Bugfix] Fix in_profile_run in mtp_proposer dummy_run#5165wangxiyuan merged 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Zetong Li <slippersss@126.com>
There was a problem hiding this comment.
Code Review
This pull request aims to fix a bug related to in_profile_run in mtp_proposer. The changes correctly add an is_profile parameter to dummy_run and pass it down. However, there are two critical issues. First, in mtp_proposer.py, the parameter passed to set_ascend_forward_context is named is_profile_run instead of the correct in_profile_run, which will cause the flag to be ignored. Second, the call to self.drafter.dummy_run in model_runner_v1.py now includes the is_profile argument, but other Proposer implementations (EagleProposer, NgramProposer, SuffixDecodingProposer) and the base Proposer interface have not been updated to accept this argument, which will lead to a TypeError at runtime.
| batch_descriptor=batch_descriptor, | ||
| is_mtp_model=True): | ||
| is_mtp_model=True, | ||
| is_profile_run=is_profile): |
There was a problem hiding this comment.
The parameter name is_profile_run is incorrect. The set_ascend_forward_context function expects in_profile_run. This typo will cause the is_profile flag to be ignored, as Python will not pass it to the function, and in_profile_run will use its default value (False). This makes the intended fix ineffective.
| is_profile_run=is_profile): | |
| in_profile_run=is_profile): |
| dummy_compute_logits=dummy_drafter_compute_logits, | ||
| in_graph_capturing=not force_attention) | ||
| in_graph_capturing=not force_attention, | ||
| is_profile=is_profile) |
There was a problem hiding this comment.
This change introduces the is_profile keyword argument to the dummy_run call. However, the Proposer interface and its other implementations (EagleProposer, NgramProposer, SuffixDecodingProposer) have not been updated to accept this argument. This will cause a TypeError at runtime if a proposer other than MtpProposer is used. To fix this, you should update the base Proposer interface in vllm_ascend/spec_decode/interface.py and all its subclasses to include is_profile=False in their dummy_run method signatures.
Signed-off-by: Zetong Li <slippersss@126.com>
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits) [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084) [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818) [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171) [CI] Improve CI (vllm-project#5078) [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160) Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167) [Doc] Add a perf tune section (vllm-project#5127) [Image] Refactor image build (vllm-project#5175) [refactor] refactor weight trans nz and transpose (vllm-project#4878) [BugFix]Fix precision issue for LoRA feature (vllm-project#4141) 【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827) support basic long_seq feature st (vllm-project#5140) [Bugfix] install trition for test_custom_op (vllm-project#5112) [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130) [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156) [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131) [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172) [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165) [Doc] Refact benchmark doc (vllm-project#5173) [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174) ... Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
) ### What this PR does / why we need it? This PR aims to fix failure of `enable_force_load_balance` caused by missing `in_profile_run` in `dummy_run` of mtp_proposer. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? by ci - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Zetong Li <slippersss@126.com>
) ### What this PR does / why we need it? This PR aims to fix failure of `enable_force_load_balance` caused by missing `in_profile_run` in `dummy_run` of mtp_proposer. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? by ci - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Zetong Li <slippersss@126.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
) ### What this PR does / why we need it? This PR aims to fix failure of `enable_force_load_balance` caused by missing `in_profile_run` in `dummy_run` of mtp_proposer. ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? by ci - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Zetong Li <slippersss@126.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
This PR aims to fix failure of
enable_force_load_balancecaused by missingin_profile_runindummy_runof mtp_proposer.Does this PR introduce any user-facing change?
N/A
How was this patch tested?
by ci