[Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide by DreamerLeader · Pull Request #6126 · vllm-project/vllm-ascend

DreamerLeader · 2026-01-22T07:49:02Z

What this PR does / why we need it?

Fix of Pooling Code and Update of Pooling Usage Guide

Does this PR introduce any user-facing change?

How was this patch tested?

pr:[Bugfix]Fixed precision issues caused by pooled request pooling
readyhttps://github.com//pull/6049
read for review

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@d682094

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

github-actions · 2026-01-22T07:49:22Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

wangxiyuan · 2026-01-22T08:25:31Z

fix ci and update the commit message

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

Pz1116 · 2026-01-22T11:09:40Z

+
+This is because HCCL one-sided communication connections are created lazily after the instance is launched when Device-to-Device communication is involved. Currently, full-mesh connections between all devices are required. Establishing these connections introduces a one-time time overhead and persistent device memory consumption (4 MB of device memory per connection).
+
+**For warm-up, it is recommended to issue requests with an input sequence length of 8K and an output sequence length of 1, with the total number of requests being 2–3× the number of devices (cards/dies).**


LGTM

TODO: Add an equation for Device memory consumption of HCCL link creation.

github-actions · 2026-01-24T14:51:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com>

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (59 commits) [Feat.]: 310p support MOE models (vllm-project#6530) [Doc] backport 0.13.0 release note (vllm-project#6584) [CI] Update UT CANN version to 8.5.0 for main branch (vllm-project#6564) [CI] Change A2 runner (vllm-project#6557) [Bugfix] Fix the incorrect use of the output parameter in _forward_fia_slidingwindow (vllm-project#6469) [main2main] upgrade vllm main 0202 (vllm-project#6560) [CI][npugraph_ex]Fix npugraph ex e2e test (vllm-project#6553) [Feature]KV pool supports sparse attention (vllm-project#6339) [bugfix]Fix accuracy issue in PCP/DCP with speculative decoding (vllm-project#6491) perf: adaptive block size selection in linear_persistent kernel (vllm-project#6537) [ModelRunner][Fix] Pads query_start_loc to satisfy FIA/TND constraint (vllm-project#6475) [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide (vllm-project#6126) [Fusion] Add rmsnorm dynamic quant fusion pass (vllm-project#6274) [Bugfix] Synchronize only the current stream to avoid device sync (vllm-project#6432) [CI] Add long and short prompt tests for DeepSeek-V3.2 (vllm-project#6499) [Refactor] MLP weight prefetch to consistency with MoE Model's prefetching in terms of code and usage (vllm-project#6442) [bugfix][npugraph_ex]duplicate pattern issue (vllm-project#6513) [bugfix][npugraph_ex]add the extra check for allreduce rmsnorm fusion pass (vllm-project#6430) [Quant] GLM4.7-Flash Support W8A8 (vllm-project#6492) [Nightly][BugFix] Remove kv_cache nz test case for test_mla_preprocess_nq.py (vllm-project#6505) ...

…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>

…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com>

…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com>

房建伟 and others added 10 commits January 20, 2026 17:43

Fixed precision issues caused by pooled request pooling

12cf043

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Fixed precision issues caused by pooled request pooling

6cbe64b

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Fixed precision issues caused by pooled request pooling

d263169

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Fixed precision issues caused by pooled request pooling

4447997

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Fixed precision issues caused by pooled request pooling

0b0a3a5

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Merge branch 'vllm-project:main' into bug_fix_0120

9209ed2

Fixed precision issues caused by pooled request pooling

4708564

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Fixed precision issues caused by pooled request pooling

fca9b9e

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

Merge branch 'vllm-project:main' into bug_fix_0120

959f226

A2 Environment Pooling for Memcache Compatibility

86cf689

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

DreamerLeader requested review from LCAIZJ, MengqingCao, Yikun and wangxiyuan as code owners January 22, 2026 07:49

github-actions Bot added the documentation Improvements or additions to documentation label Jan 22, 2026

Fix of Pooling Code and Update of Pooling Usage Guide

c45360c

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

DreamerLeader changed the title ~~Fix of Pooling Code and Update of Pooling Usage Guide~~ [Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide Jan 22, 2026

Fix of Pooling Code and Update of Pooling Usage Guide

64dad81

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

wangxiyuan approved these changes Jan 22, 2026

View reviewed changes

Fix of Pooling Code and Update of Pooling Usage Guide

45921c6

Signed-off-by: fangjianwei <f30058701@china.huawei.com>

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 22, 2026

Pz1116 reviewed Jan 22, 2026

View reviewed changes

github-actions Bot added the merge-conflicts label Jan 24, 2026

Merge branch 'main' into bug_fix_0120

d890b56

Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com>

github-actions Bot removed the merge-conflicts label Jan 26, 2026

Merge branch 'vllm-project:main' into bug_fix_0120

a2811bd

DreamerLeader and others added 2 commits February 3, 2026 09:36

Merge branch 'vllm-project:main' into bug_fix_0120

875303e

Fixed precision issues caused by pooled request pooling

c58a2b6

Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>

whx-sjtu merged commit 2dac18a into vllm-project:main Feb 4, 2026
17 checks passed

DreamerLeader deleted the bug_fix_0120 branch March 14, 2026 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide#6126

[Bugfix]Fix of Pooling Code and Update of Pooling Usage Guide#6126
whx-sjtu merged 17 commits intovllm-project:mainfrom
DreamerLeader:bug_fix_0120

DreamerLeader commented Jan 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

wangxiyuan commented Jan 22, 2026

Uh oh!

Pz1116 Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		This is because HCCL one-sided communication connections are created lazily after the instance is launched when Device-to-Device communication is involved. Currently, full-mesh connections between all devices are required. Establishing these connections introduces a one-time time overhead and persistent device memory consumption (4 MB of device memory per connection).

		For warm-up, it is recommended to issue requests with an input sequence length of 8K and an output sequence length of 1, with the total number of requests being 2–3× the number of devices (cards/dies).

Conversation

DreamerLeader commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

wangxiyuan commented Jan 22, 2026

Uh oh!

Pz1116 Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DreamerLeader commented Jan 22, 2026 •

edited

Loading