[BugFix] Adapted Qwen3-Next to v0.11.2#4477
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request adapts the Qwen3-Next model to vLLM v0.11.2. The changes include enabling a previously skipped test and fixing a variable shadowing bug in the model implementation. The fix for the shadowing bug is correct and prevents a runtime error. I have identified an additional potential critical issue: an IndexError could occur if batch_size is zero, which would crash the server. I've provided a comment with a code suggestion to make the code more robust against this edge case.
| tar_dtype = temp_core_attn_out[0].dtype | ||
| tar_device = temp_core_attn_out[0].device | ||
| tar_shape = list(temp_core_attn_out[0].shape) | ||
| tar_shape[1] = non_spec_query_start_loc[-1] | ||
| core_attn_out_non_spec = torch.empty(tar_shape, | ||
| dtype=tar_dtype, | ||
| device=tar_device) | ||
| for b_idx in range(batch_size): | ||
| cur_core_attn_out = core_attn_out[b_idx] | ||
| cur_core_attn_out = temp_core_attn_out[b_idx] | ||
| start, end = non_spec_query_start_loc[ | ||
| b_idx], non_spec_query_start_loc[b_idx + 1] | ||
| core_attn_out_non_spec[:, start:end, ...] = cur_core_attn_out |
There was a problem hiding this comment.
There is a potential IndexError here. If batch_size is 0, temp_core_attn_out will be an empty list, and accessing temp_core_attn_out[0] at line 708 will raise an exception. While it seems unlikely for batch_size to be 0 when num_prefills > 0, it's safer to guard against this to prevent a server crash.
Additionally, torch.cat(last_recurrent_state, dim=0) at line 720 will also fail if last_recurrent_state is an empty list (when batch_size is 0).
I suggest wrapping this block and line 720 in a check for batch_size > 0 and handling the batch_size == 0 case separately by creating empty tensors for core_attn_out_non_spec and last_recurrent_state.
Here is a suggested implementation for lines 708-719. Please note that line 720 should also be moved inside the if batch_size > 0: block.
if batch_size > 0:
tar_dtype = temp_core_attn_out[0].dtype
tar_device = temp_core_attn_out[0].device
tar_shape = list(temp_core_attn_out[0].shape)
tar_shape[1] = non_spec_query_start_loc[-1]
core_attn_out_non_spec = torch.empty(tar_shape,
dtype=tar_dtype,
device=tar_device)
for b_idx in range(batch_size):
cur_core_attn_out = temp_core_attn_out[b_idx]
start, end = non_spec_query_start_loc[
b_idx], non_spec_query_start_loc[b_idx + 1]
core_attn_out_non_spec[:, start:end, ...] = cur_core_attn_out
else:
num_v_heads = self.num_v_heads // self.tp_size
core_attn_out_non_spec = torch.empty(
(1, 0, num_v_heads, self.head_v_dim),
dtype=ssm_state.dtype,
device=ssm_state.device
)Signed-off-by: drslark <slarksblood@qq.com>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com>
### What this PR does / why we need it? Adapted Qwen3-Next eager mode to `v0.11.2`. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: drslark <slarksblood@qq.com>
What this PR does / why we need it?
Adapted Qwen3-Next to
v0.11.2.For the program:
The output was:
The output now is:
Does this PR introduce any user-facing change?
N/A
How was this patch tested?