[Bugfix]Fixed precision issues caused by pooled request pooling#6049
Merged
wangxiyuan merged 8 commits intovllm-project:mainfrom Jan 20, 2026
Merged
[Bugfix]Fixed precision issues caused by pooled request pooling#6049wangxiyuan merged 8 commits intovllm-project:mainfrom
wangxiyuan merged 8 commits intovllm-project:mainfrom
Conversation
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Contributor
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
wangxiyuan
approved these changes
Jan 20, 2026
added 3 commits
January 20, 2026 18:01
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
LCAIZJ
approved these changes
Jan 20, 2026
Collaborator
|
LGTM |
wangxiyuan
approved these changes
Jan 20, 2026
Contributor
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
added 2 commits
January 20, 2026 23:11
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
zzzzwwjj
pushed a commit
that referenced
this pull request
Jan 20, 2026
### What this PR does / why we need it? #6049 Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
845473182
pushed a commit
to 845473182/vllm-ascend
that referenced
this pull request
Jan 21, 2026
…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (24 commits) add dispath_ffn_combine_bf16 (vllm-project#5866) [BugFix] Fix input parameter bug of dispatch_gmm_combine_decode[RFC: issue 5476] (vllm-project#5932) [1/N][Feat] Xlite Qwen3 MoE Support (vllm-project#5951) [Bugfix] Fix setting of `speculative_config.enforce_eager` for dsv32 (vllm-project#5945) [bugfix][mm] change get_num_encoder_tokens to get_num_encoder_embeds in recompute_schedule.py (vllm-project#5132) [Bugfix] fix pcp qwen full graph FIA bug (vllm-project#6037) [Bugfix]Fixed precision issues caused by pooled request pooling (vllm-project#6049) 【main】【bugfix】Resolved memory deallocation failure in the pooling layer under re-computation workloads. (vllm-project#6045) [main][Bugfix] Fixed an problem related to embeddings sharing (vllm-project#5967) [Feature]refactor the npugraph_ex config, support online-infer with static kernel (vllm-project#5775) [CI][Lint] Show lint diff on failure (vllm-project#5956) [CI] Add wait logic for each individual case (vllm-project#6036) [CI] Add DeepSeek-V3.2-W8A8 nightly ci test (vllm-project#4633) model runner v2 support triton of penalty (vllm-project#5854) [Docs][Model] Support Qwen3-VL-Embedding & Qwen3-VL-Reranker (vllm-project#6034) [Tests] move qwen3 performance test from nightly to e2e (vllm-project#5980) [Bugfix] fix bug of pcp+mtp+async scheduler (vllm-project#5994) [Main2Main] Upgrade vllm commit to releases/v0.14.0 (vllm-project#5988) [Ops] Add layernorm for qwen3Next (vllm-project#5765) [Doc] Add layer_sharding additional config for DeepSeek-V3.2-W8A8 (vllm-project#5921) ...
huangfeifei1995
pushed a commit
to huangfeifei1995/vllm-ascend
that referenced
this pull request
Jan 21, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: huangning1995 <huangning12@huawei.com>
huangfeifei1995
added a commit
to huangfeifei1995/vllm-ascend
that referenced
this pull request
Jan 21, 2026
…ng (vllm-project#6049)" This reverts commit fea0129.
starmountain1997
pushed a commit
to starmountain1997/vllm-ascend
that referenced
this pull request
Jan 31, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
starmountain1997
pushed a commit
to starmountain1997/vllm-ascend
that referenced
this pull request
Jan 31, 2026
…-project#6057) ### What this PR does / why we need it? vllm-project#6049 Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
starmountain1997
pushed a commit
to starmountain1997/vllm-ascend
that referenced
this pull request
Jan 31, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
whx-sjtu
pushed a commit
that referenced
this pull request
Feb 4, 2026
### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](#6049) readyhttps://github.com//pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com>
chenchuw886
pushed a commit
to chenchuw886/vllm-ascend
that referenced
this pull request
Feb 12, 2026
…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: momochenchuw <chenchuw@huawei.com>
tangtiangu
pushed a commit
to tangtiangu/jiusi-vllm-ascend
that referenced
this pull request
Feb 24, 2026
…-project#6057) ### What this PR does / why we need it? vllm-project#6049 Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
tangtiangu
pushed a commit
to tangtiangu/jiusi-vllm-ascend
that referenced
this pull request
Feb 24, 2026
…-project#6057) ### What this PR does / why we need it? vllm-project#6049 Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Feb 28, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Feb 28, 2026
…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241
pushed a commit
to maoxx241/vllm-ascend
that referenced
this pull request
Mar 2, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
maoxx241
pushed a commit
to maoxx241/vllm-ascend
that referenced
this pull request
Mar 2, 2026
…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Mar 4, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Mar 4, 2026
…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ
pushed a commit
to LCAIZJ/vllm-ascend
that referenced
this pull request
Mar 7, 2026
…-project#6049) ### What this PR does / why we need it? Fixed precision issues caused by pooled request pooling ### Does this PR introduce _any_ user-facing change? pr6045 ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local>
LCAIZJ
pushed a commit
to LCAIZJ/vllm-ascend
that referenced
this pull request
Mar 7, 2026
…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com>
jiangyunfan1
pushed a commit
to jiangyunfan1/vllm-ascend
that referenced
this pull request
Apr 9, 2026
…roject#6126) ### What this PR does / why we need it? Fix of Pooling Code and Update of Pooling Usage Guide ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? pr:[[Bugfix]Fixed precision issues caused by pooled request pooling](vllm-project#6049) readyhttps://github.com/vllm-project/pull/6049 read for review - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@d682094 --------- Signed-off-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Signed-off-by: fangjianwei <f30058701@china.huawei.com> Signed-off-by: DreamerLeader <88812830+DreamerLeader@users.noreply.github.com> Co-authored-by: 房建伟 <fangjianwei@fangjianweideMacBook-Air.local> Co-authored-by: fangjianwei <f30058701@china.huawei.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it?
Fixed precision issues caused by pooled request pooling
Does this PR introduce any user-facing change?
pr6045
How was this patch tested?