[DeepSeek] Seperate deepseek v3.2 modeling form deepseek v2 by MengqingCao · Pull Request #3531 · vllm-project/vllm-ascend

MengqingCao · 2025-10-18T10:01:31Z

What this PR does / why we need it?

Seperate deepseek v3.2 modeling form deepseek v2

How was this patch tested?

CI passed with existing test.
test deepseek v3.2 locally
vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

Signed-off-by: MengqingCao <cmq0113@163.com>

github-actions · 2025-10-18T10:01:38Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the DeepSeek model implementations by separating the v3.2 logic into its own file, deepseek_v3_2.py, from the v2 implementation. This is a good architectural improvement for clarity and maintenance. My review focuses on the new deepseek_v3_2.py file. I've identified a critical issue with monkey-patching a base vLLM class, which can cause unpredictable behavior and should be refactored. Additionally, there's an instance of code duplication that should be addressed to improve maintainability.

gemini-code-assist · 2025-10-18T10:03:00Z

+    pass
+
+
+DeepseekV2DecoderLayer.__init__ = CustomDeepseekV2DecoderLayer.__init__


Monkey-patching DeepseekV2DecoderLayer.__init__ is a dangerous practice. It modifies a base class from vllm globally at import time, which can lead to unexpected side effects and bugs that are difficult to trace. This is especially problematic as vllm_ascend/models/deepseek_v2.py performs a similar patch, making behavior dependent on import order.

Please refactor to avoid monkey-patching. You can achieve this by:

Removing this line.

Using CustomDeepseekV2DecoderLayer directly in AscendDeepseekV2Model's make_layers call (lines 102-106).

Here is how you can modify AscendDeepseekV2Model:

# vllm_ascend/models/deepseek_v3_2.py:102 self.start_layer, self.end_layer, self.layers = make_layers( config.num_hidden_layers, lambda prefix: CustomDeepseekV2DecoderLayer(vllm_config, prefix, topk_indices_buffer), prefix=f"{prefix}.layers")

gemini-code-assist · 2025-10-18T10:03:00Z

+class CustomDeepseekV2RowParallelLinear(RowParallelLinear):
+
+    def __init__(
+        self,
+        input_size: int,
+        output_size: int,
+        bias: bool = True,
+        input_is_parallel: bool = True,
+        skip_bias_add: bool = False,
+        params_dtype: Optional[torch.dtype] = None,
+        reduce_results: bool = True,
+        quant_config: Optional[QuantizationConfig] = None,
+        prefix: str = "",
+        *,
+        return_bias: bool = True,
+        disable_tp: bool = False,
+    ):
+        # Divide the weight matrix along the first dimension.
+        self.tp_rank = (get_tensor_model_parallel_rank()
+                        if not disable_tp else 0)
+        self.tp_size = (get_tensor_model_parallel_world_size()
+                        if not disable_tp else 1)
+        self.input_size_per_partition = divide(input_size, self.tp_size)
+        self.output_size_per_partition = output_size
+        self.output_partition_sizes = [output_size]
+
+        AscendLinearBase.__init__(self,
+                                  input_size,
+                                  output_size,
+                                  skip_bias_add,
+                                  params_dtype,
+                                  quant_config,
+                                  prefix,
+                                  return_bias=return_bias,
+                                  disable_tp=disable_tp)
+
+        self.input_is_parallel = input_is_parallel
+        self.reduce_results = reduce_results
+
+        assert self.quant_method is not None
+        self.quant_method.create_weights(
+            layer=self,
+            input_size_per_partition=self.input_size_per_partition,
+            output_partition_sizes=self.output_partition_sizes,
+            input_size=self.input_size,
+            output_size=self.output_size,
+            params_dtype=self.params_dtype,
+            weight_loader=(
+                self.weight_loader_v2 if self.quant_method.__class__.__name__
+                in WEIGHT_LOADER_V2_SUPPORTED else self.weight_loader))
+        if not reduce_results and (bias and not skip_bias_add):
+            raise ValueError("When not reduce the results, adding bias to the "
+                             "results can lead to incorrect results")
+
+        if bias:
+            self.bias = nn.Parameter(
+                torch.empty(self.output_size, dtype=params_dtype))
+            set_weight_attrs(self.bias, {
+                "output_dim": 0,
+                "weight_loader": self.weight_loader,
+            })
+        else:
+            self.register_parameter("bias", None)
+        self.update_param_tp_status()
+
+    def forward(
+        self,
+        input_,
+        is_prefill=True,
+        is_force_scatter=False
+    ) -> Union[torch.Tensor, tuple[torch.Tensor, Optional[nn.Parameter]]]:
+        if self.input_is_parallel:
+            input_parallel = input_
+        else:
+            tp_rank = get_tensor_model_parallel_rank()
+            splitted_input = split_tensor_along_last_dim(
+                input_, num_partitions=self.tp_size)
+            input_parallel = splitted_input[tp_rank].contiguous()
+
+        # Matrix multiply.
+        assert self.quant_method is not None
+        # Only fuse bias add into GEMM for rank 0 (this ensures that
+        # bias will not get added more than once in TP>1 case)
+        bias_ = None if (self.tp_rank > 0 or self.skip_bias_add) else self.bias
+        output_parallel = self.quant_method.apply(self,
+                                                  input_parallel,
+                                                  bias=bias_)
+        if self.reduce_results and self.tp_size > 1:
+            output = tensor_model_parallel_all_reduce(output_parallel)
+        else:
+            output = output_parallel
+
+        output_bias = self.bias if self.skip_bias_add else None
+
+        if not self.return_bias:
+            return output
+        return output, output_bias


The CustomDeepseekV2RowParallelLinear class appears to be a reimplementation of AscendRowParallelLinear from vllm_ascend.ops.linear. This code duplication increases maintenance overhead and can lead to inconsistencies if one implementation is updated but the other is not.

To improve maintainability, please remove this duplicated class and use the existing AscendRowParallelLinear from vllm_ascend.ops.linear in CustomDeepseekV2SFAAttention (line 309).

MengqingCao · 2025-10-20T01:16:13Z

@wangxiyuan @whx-sjtu plz take a look, thanks!

wangxiyuan · 2025-10-20T01:18:04Z

    ModelRegistry.register_model(
        "DeepseekV32ForCausalLM",
-        "vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM")
+        "vllm_ascend.models.deepseek_v3_2:CustomDeepseekV3ForCausalLM")


deepseek_v2 is not supported now?

Still support deepseek v2 now, it is registered at L34
https://github.com/vllm-project/vllm-ascend/pull/3531/files#diff-20d47cd355c22411f1a056e83a551376d54b2d0612d230c98c507e00dfd9ad04R34

whx-sjtu

LGTM. This is the pre-PR needed by PR #3189.

…ject#3531) Seperate deepseek v3.2 modeling form deepseek v2 - CI passed with existing test. - test deepseek v3.2 locally - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com>

…ject#3531) ### What this PR does / why we need it? Seperate deepseek v3.2 modeling form deepseek v2 ### How was this patch tested? - CI passed with existing test. - test deepseek v3.2 locally - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com>

…ject#3531) ### What this PR does / why we need it? Seperate deepseek v3.2 modeling form deepseek v2 ### How was this patch tested? - CI passed with existing test. - test deepseek v3.2 locally - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: luolun <luolun1995@cmbchina.com>

…ject#3531) ### What this PR does / why we need it? Seperate deepseek v3.2 modeling form deepseek v2 ### How was this patch tested? - CI passed with existing test. - test deepseek v3.2 locally - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: hwhaokun <haokun0405@163.com>

…ject#3531) ### What this PR does / why we need it? Seperate deepseek v3.2 modeling form deepseek v2 ### How was this patch tested? - CI passed with existing test. - test deepseek v3.2 locally - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: nsdie <yeyifan@huawei.com>

…ject#3531) ### What this PR does / why we need it? Seperate deepseek v3.2 modeling form deepseek v2 ### How was this patch tested? - CI passed with existing test. - test deepseek v3.2 locally - vLLM version: v0.11.0rc3 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0 Signed-off-by: MengqingCao <cmq0113@163.com>

[DeepSeek] Split deepseek v3.2 modeling

9a731e3

Signed-off-by: MengqingCao <cmq0113@163.com>

gemini-code-assist Bot reviewed Oct 18, 2025

View reviewed changes

MengqingCao added ready read for review ready-for-test start test by label for PR labels Oct 18, 2025

wangxiyuan reviewed Oct 20, 2025

View reviewed changes

wangxiyuan approved these changes Oct 20, 2025

View reviewed changes

whx-sjtu approved these changes Oct 20, 2025

View reviewed changes

wangxiyuan merged commit daa4dd0 into vllm-project:main Oct 20, 2025
38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSeek] Seperate deepseek v3.2 modeling form deepseek v2#3531

[DeepSeek] Seperate deepseek v3.2 modeling form deepseek v2#3531
wangxiyuan merged 1 commit intovllm-project:mainfrom
MengqingCao:slipt_ds

MengqingCao commented Oct 18, 2025 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented Oct 18, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Oct 18, 2025

Uh oh!

gemini-code-assist Bot Oct 18, 2025

Uh oh!

MengqingCao commented Oct 20, 2025

Uh oh!

wangxiyuan Oct 20, 2025

Uh oh!

MengqingCao Oct 20, 2025

Uh oh!

whx-sjtu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		pass


		DeepseekV2DecoderLayer.__init__ = CustomDeepseekV2DecoderLayer.__init__

Conversation

MengqingCao commented Oct 18, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Uh oh!

github-actions Bot commented Oct 18, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Oct 20, 2025

Uh oh!

wangxiyuan Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MengqingCao commented Oct 18, 2025 •

edited by github-actions Bot

Loading