[0.9.1] Add LMhead TP communication groups. by Angazenn · Pull Request #1956 · vllm-project/vllm-ascend

Angazenn · 2025-07-23T06:14:47Z

What this PR does / why we need it?

In pure dp scenarios (such as DP32)， LMHead comptuation takes 1~2ms. In this PR we customize the parallelism of LMHead，enabling the separate TP of LMHead. The computation flow is listed as follows:

get_lmhead_group().all_gather  # [num_tokens, hid_dim] -->  [num_tokens * lmhead_tp, hid_dim]
--> lmhead matmul  # [num_tokens * lmhead_tp, hid_dim] -->  [num_tokens * lmhead_tp, vocab_size //  lmhead_tp]
--> get_lmhead_group().all_to_all  # [num_tokens * lmhead_tp, vocab_size //  lmhead_tp] --> [num_tokens, vocab_size]

this can decrease 0.5~1ms for deepseek with 28BS on a single die、MTP.

In addition, this PR also fixes a bug that introduced by LMHead quantization. The OP npu_quant_matmul only accepts dim < 65536, while vocab_size is > 65536 if using TP 1. We can set lmhead tp size > 1 to avoid this bug.

Main version of this PR: #2309 .

Does this PR introduce any user-facing change?

Yes. We introduced another configurable options lmhead_tp_size in ascend_config. For example:

additional_config={
        "lmhead_tp_size": 16,
}

The default value is -1, and lmhead_tp_size is automatically set to tensor_parallel_size in this case. Besides, it is suggested to use it when running full DP to avoid additional communication introduced by TP. Therefore, the parallel size of lmhead group will also be changed to tensor_parallel_size if TP > 1 so as to fall back to normally TP+DP case.

How was this patch tested?

github-actions · 2025-07-27T14:13:34Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-07-27T14:18:42Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-07-30T08:02:47Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-07-31T02:07:45Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: angazenn <zengyanjia@huawei.com>

momo609 · 2025-08-04T08:24:35Z

+                if not with_prefill:
+                    padded_num_indices = num_tokens
+                else:
+                    padded_num_indices = max_num_reqs


Will padding here cause the time to be longer when DP has a serious uneven load?

Yes, there might be performance degradation. However, in some cases (you can see _get_forward_metadata_across_dp ) the all_reduce communication used for gathering metedata is skipped. Thus using true num_tokens_across_dp will incur another all_reduce communication in this case. Maybe we can have a better solution for this.

wangxiyuan · 2025-08-12T08:56:30Z

                                     backend,
                                     group_name="mc2")

+    all_ranks = torch.arange(world_size).reshape(-1, lm_head_tp_size)


TODO: please create this parallel only when runing deepseek

Signed-off-by: zengyanjia <z00883269@china.huawei.com>

Ronald1995 · 2025-08-04T08:09:39Z

            False)  # Whether to enable DeepSeek models' prefill optimizations
        self.enable_cpu_binding = additional_config.get(  # Whether to enable the cpu binding
            "enable_cpu_binding", False)
+        self.lmhead_tp_size = additional_config.get("lmhead_tp_size", -1)


it's better that the default value is 1

github-actions Bot added the module:ops label Jul 23, 2025

Angazenn force-pushed the lmhead branch 8 times, most recently from dfd0307 to eda2121 Compare July 26, 2025 10:39

github-actions Bot removed the module:ops label Jul 26, 2025

Angazenn force-pushed the lmhead branch from eda2121 to c041d02 Compare July 26, 2025 10:55

github-actions Bot added merge-conflicts module:core and removed merge-conflicts labels Jul 27, 2025

Angazenn force-pushed the lmhead branch 2 times, most recently from ae9413c to 9b86bbf Compare July 27, 2025 14:39

github-actions Bot removed the merge-conflicts label Jul 27, 2025

Angazenn force-pushed the lmhead branch 3 times, most recently from 5ef27e1 to 74cd78d Compare July 29, 2025 11:32

github-actions Bot added the merge-conflicts label Jul 30, 2025

Angazenn force-pushed the lmhead branch 2 times, most recently from 27cbf3a to 49933c3 Compare July 30, 2025 12:30

github-actions Bot removed the merge-conflicts label Jul 30, 2025

github-actions Bot added the merge-conflicts label Jul 31, 2025

angazenn added 2 commits July 31, 2025 19:10

support lmhead tp

ed2fa49

Signed-off-by: angazenn <zengyanjia@huawei.com>

bugfix

4434cc0

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the lmhead branch from 288ea6c to 4434cc0 Compare July 31, 2025 11:11

github-actions Bot removed the merge-conflicts label Jul 31, 2025

fix mtp

1eccd75

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn force-pushed the lmhead branch from 4c352a6 to 1eccd75 Compare August 1, 2025 11:50

Angazenn mentioned this pull request Aug 4, 2025

[WIP]support lmhead tp #2185

Closed

angazenn added 2 commits August 4, 2025 10:43

mv back to ops

1596ee1

Signed-off-by: angazenn <zengyanjia@huawei.com>

rename

f4d68ef

Signed-off-by: angazenn <zengyanjia@huawei.com>

github-actions Bot added the module:ops label Aug 4, 2025

fix lint

d9a4c9e

Signed-off-by: angazenn <zengyanjia@huawei.com>

Angazenn changed the title ~~[DRAFT] Lmhead TP~~ [0.9.1] Lmhead TP Aug 4, 2025

Angazenn changed the title ~~[0.9.1] Lmhead TP~~ [0.9.1] Add LMhead TP communication groups. Aug 4, 2025

momo609 reviewed Aug 4, 2025

View reviewed changes

MengqingCao reviewed Aug 12, 2025

View reviewed changes

Comment thread vllm_ascend/distributed/parallel_state.py Outdated

wangxiyuan reviewed Aug 12, 2025

View reviewed changes

modify assertion message

ff59d08

Signed-off-by: zengyanjia <z00883269@china.huawei.com>

Angazenn force-pushed the lmhead branch from f74c901 to ff59d08 Compare August 12, 2025 11:43

zengyanjia added 3 commits August 12, 2025 20:11

avoid initialization of lmhead_tp for non-deepseek models

b13aeb1

Signed-off-by: zengyanjia <z00883269@china.huawei.com>

fix lint

8adbdb1

Signed-off-by: zengyanjia <z00883269@china.huawei.com>

fix

c339f42

Signed-off-by: zengyanjia <z00883269@china.huawei.com>

wangxiyuan approved these changes Aug 14, 2025

View reviewed changes

ganyi1996ppo approved these changes Aug 14, 2025

View reviewed changes

ganyi1996ppo merged commit f5226e3 into vllm-project:v0.9.1-dev Aug 14, 2025
17 checks passed

shen-shanshan mentioned this pull request Aug 18, 2025

[Release]: Release checklist for v0.9.1rc3 #2396

Closed

25 tasks

Angazenn deleted the lmhead branch September 8, 2025 03:16

Ronald1995 reviewed Dec 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.9.1] Add LMhead TP communication groups.#1956

[0.9.1] Add LMhead TP communication groups.#1956
ganyi1996ppo merged 10 commits intovllm-project:v0.9.1-devfrom
Angazenn:lmhead

Angazenn commented Jul 23, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

github-actions Bot commented Jul 30, 2025

Uh oh!

github-actions Bot commented Jul 31, 2025

Uh oh!

momo609 Aug 4, 2025

Uh oh!

Angazenn Aug 4, 2025

Uh oh!

Uh oh!

wangxiyuan Aug 12, 2025

Uh oh!

Angazenn Aug 13, 2025

Uh oh!

Uh oh!

Ronald1995 Aug 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Angazenn commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

github-actions Bot commented Jul 27, 2025

Uh oh!

github-actions Bot commented Jul 30, 2025

Uh oh!

github-actions Bot commented Jul 31, 2025

Uh oh!

momo609 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Angazenn Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wangxiyuan Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Angazenn Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Ronald1995 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Angazenn commented Jul 23, 2025 •

edited

Loading