[Doc]Add the user_guide doc file regarding fine-grained TP. by zzhx1 · Pull Request #5084 · vllm-project/vllm-ascend

zzhx1 · 2025-12-16T07:54:18Z

What this PR does / why we need it?

Add user guide for Fine-Grained Tensor Parallelism feature.
Documents usage, supported components (embedding, lm_head, o_proj, mlp/dense_ffn), model compatibility, and deployment guidelines.

Functionality implemented in:

add mlp tp optimze #2120 (MLP TP)
[feat]: oproj tensor parallelism in pure DP and graph-mode scenarios. #2167 (OProj TP)
[Feat]: Add custom lmhead tensor model parallel #2309 (LM Head TP)
[Feat] Add custom Embedding tensor model parallel #2616 (Embedding TP)
[Feat] Support MLP_TP feature, exclude MOE layer #4999 (Dense FFN TP)
vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

gemini-code-assist

Code Review

This pull request adds user documentation for the Fine-Grained Tensor Parallelism feature. The documentation is comprehensive, covering the overview, benefits, usage, and deployment recommendations. However, I've identified a few critical issues. The documentation for mlp_tensor_parallel_size is misleading due to an underlying bug where its value is incorrectly tied to embedding_tensor_parallel_size. Additionally, the documentation for o_proj is missing a key restriction regarding its use in prefill-decode disaggregation scenarios. Finally, there is a syntax error in one of the configuration examples. These issues should be addressed to ensure the documentation is accurate and prevents user errors.

gemini-code-assist · 2025-12-16T07:55:57Z

+
+### Component & Execution Mode Support  
+- **`embedding`, `lm_head`, and `mlp`**: Can be configured with fine-grained TP in any execution context—prefill, decode, or mixed deployment.
+- **`o_proj`**: Currently, fine-grained TP for the attention output projection is **only supported in graph-capture mode** (e.g., CUDA Graphs). It cannot be enabled in eager execution.


The documentation for o_proj fine-grained TP is incomplete. The implementation in vllm_ascend/ascend_config.py (lines 195-198) reveals an additional critical restriction: oproj_tensor_parallel_size is only supported in prefill-decode (PD) disaggregation scenarios and can only be used on the decode (consumer) nodes. This should be explicitly mentioned to prevent users from encountering runtime errors in unsupported configurations.

I suggest updating the line to be more explicit about all restrictions:

- **`o_proj`**: Currently, fine-grained TP for the attention output projection has two key restrictions: - It is **only supported in graph-capture mode** (e.g., CUDA Graphs) and cannot be enabled in eager execution. - It is **only supported in prefill-decode (PD) disaggregation scenarios and must be used on a decode (consumer) node**.

github-actions · 2025-12-16T10:30:07Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

Signed-off-by: chenxiao <Jaychou1620@Gmail.com>

Signed-off-by: 秋刀鱼 <jaychou1620@Gmail.com>

jianzs

LGTM

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits) [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084) [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818) [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171) [CI] Improve CI (vllm-project#5078) [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160) Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167) [Doc] Add a perf tune section (vllm-project#5127) [Image] Refactor image build (vllm-project#5175) [refactor] refactor weight trans nz and transpose (vllm-project#4878) [BugFix]Fix precision issue for LoRA feature (vllm-project#4141) 【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827) support basic long_seq feature st (vllm-project#5140) [Bugfix] install trition for test_custom_op (vllm-project#5112) [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130) [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156) [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131) [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172) [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165) [Doc] Refact benchmark doc (vllm-project#5173) [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174) ... Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

…ject#5084) ### What this PR does / why we need it? Add user guide for **Fine-Grained Tensor Parallelism** feature. Documents usage, supported components (`embedding`, `lm_head`, `o_proj`, `mlp`/`dense_ffn`), model compatibility, and deployment guidelines. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Signed-off-by: chenxiao <Jaychou1620@Gmail.com> Signed-off-by: 秋刀鱼 <jaychou1620@Gmail.com> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com>

…ject#5084) ### What this PR does / why we need it? Add user guide for **Fine-Grained Tensor Parallelism** feature. Documents usage, supported components (`embedding`, `lm_head`, `o_proj`, `mlp`/`dense_ffn`), model compatibility, and deployment guidelines. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Signed-off-by: chenxiao <Jaychou1620@Gmail.com> Signed-off-by: 秋刀鱼 <jaychou1620@Gmail.com> Co-authored-by: chenxiao <Jaychou1620@Gmail.com> Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Add the user_guide doc file regarding fine-grained TP.

e004e4c

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

gemini-code-assist Bot reviewed Dec 16, 2025

View reviewed changes

github-actions Bot added the documentation Improvements or additions to documentation label Dec 16, 2025

zzhx1 added 2 commits December 17, 2025 15:15

Improve doc

f18c3fc

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

fix

d700184

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

zzhx1 force-pushed the finegrained-TP branch from ff932a5 to d700184 Compare December 18, 2025 13:24

Kurumi5210 and others added 2 commits December 18, 2025 21:28

docs fix

32eca34

Signed-off-by: chenxiao <Jaychou1620@Gmail.com>

Merge branch 'main' into finegrained-TP

7c9a4b6

Signed-off-by: 秋刀鱼 <jaychou1620@Gmail.com>

zzhx1 changed the title ~~[WIP] [Doc]Add the user_guide doc file regarding fine-grained TP.~~ [Doc]Add the user_guide doc file regarding fine-grained TP. Dec 18, 2025

jianzs approved these changes Dec 19, 2025

View reviewed changes

Comment thread docs/source/user_guide/feature_guide/Fine_grained_TP.md Outdated

Comment thread docs/source/user_guide/feature_guide/Fine_grained_TP.md Outdated

zzhx1 and others added 3 commits December 19, 2025 15:07

fix name from comment

fbb5554

Signed-off-by: zzhx1 <zzh_201018@outlook.com>

Merge branch 'main' into finegrained-TP

ba90d7b

Merge branch 'main' into finegrained-TP

3abadf3

jianzs merged commit 17f2eea into vllm-project:main Dec 19, 2025
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc]Add the user_guide doc file regarding fine-grained TP.#5084

[Doc]Add the user_guide doc file regarding fine-grained TP.#5084
jianzs merged 8 commits intovllm-project:mainfrom
zzhx1:finegrained-TP

zzhx1 commented Dec 16, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot Dec 16, 2025

Uh oh!

Uh oh!

github-actions Bot commented Dec 16, 2025

Uh oh!

jianzs left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zzhx1 commented Dec 16, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Dec 16, 2025

Uh oh!

jianzs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zzhx1 commented Dec 16, 2025 •

edited by github-actions Bot

Loading