Skip to content

[Model] Standardize pooling heads#32148

Merged
noooop merged 13 commits intovllm-project:mainfrom
DarkLight1337:standardize-head
Jan 12, 2026
Merged

[Model] Standardize pooling heads#32148
noooop merged 13 commits intovllm-project:mainfrom
DarkLight1337:standardize-head

Conversation

@DarkLight1337
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 commented Jan 12, 2026

Purpose

Follow-up to #32119, make pooling params actually take effect for custom models.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

Cursor Bugbot is generating a summary for commit 610deac. Configure here.


Note

Unifies pooling head construction and applies pooling params consistently across models.

  • Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead to accept projector, head_dtype, and activation; conditionally apply normalize/use_activation and dimensions slicing
  • Centralizes head instantiation in seqwise/tokwise poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BERT, BERT+RoPE, ModernBERT, and GritLM to use EmbeddingPoolerHead-based poolers; replace custom inline heads with configured projector+activation; simplify constructors to take VllmConfig
  • Ensures classifier heads support optional logit_bias and proper dtype; token heads mirror the same behavior
  • Expands __all__ and minor API cleanups (type hints, return values)

Written by Cursor Bugbot for commit 610deac. This will update automatically on new commits. Configure here.


Note

Standardizes pooling heads and consistently applies dimensions, normalize, and classifier use_activation across models.

  • Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token-wise variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
  • Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; constructors now take VllmConfig and remove inline head logic
  • Expands __all__ in pooler/common.py; minor type hints and return cleanups
  • Removes SupportsPP from several MTP model classes without behavior changes

Written by Cursor Bugbot for commit e25bf36c6ebe2a3c3e1d772975f9658419870204. This will update automatically on new commits. Configure here.


Note

Unifies pooling head design and ensures PoolingParams (e.g., normalize, use_activation, dimensions) are applied consistently for sequence and token tasks.

  • Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead to accept projector, head_dtype, activation, and optional logit_bias
  • Centralizes head setup in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BERT, BERT+RoPE, ModernBERT, and GritLM to use EmbeddingPoolerHead-based SequencePooler; constructors now take VllmConfig and replace custom inline heads
  • Applies activation conditionally per-request for both seq and token heads; enforces dtype casting and Matryoshka dimensions slicing
  • Expands exports in pooler/common.py and simplifies type hints/returns

Written by Cursor Bugbot for commit 610deac. This will update automatically on new commits. Configure here.


Note

Streamlines pooling across sequence and token tasks and ensures PoolingParams are honored consistently.

  • Adds ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; conditionally applies normalize/use_activation and dimensions slicing
  • Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to build on EmbeddingPoolerHead; constructors now take VllmConfig and remove inline head logic
  • Expands __all__ in pooler/common.py and simplifies types/returns; no behavior outside pooling paths is changed

Written by Cursor Bugbot for commit 630efb0. This will update automatically on new commits. Configure here.


Note

Unifies pooling head design and ensures PoolingParams are honored consistently for sequence and token tasks.

  • Introduces ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias, with conditional normalize/use_activation and dimensions slicing
  • Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; constructors now take VllmConfig and remove inline head logic
  • Expands exports in pooler/common.py and simplifies types/returns; no changes outside pooling paths

Written by Cursor Bugbot for commit b6a60c0. This will update automatically on new commits. Configure here.


Note

Standardizes pooling heads and enforces consistent application of PoolingParams across sequence and token tasks.

  • Adds ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
  • Applies dimensions slicing and conditional normalize/use_activation per request; dtype casting gated by head_dtype
  • Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; remove inline head logic and take VllmConfig/ModelConfig where appropriate
  • Expands exports in pooler/common.py and tightens types/returns

Written by Cursor Bugbot for commit dd3eca8. This will update automatically on new commits. Configure here.


Note

Standardizes pooling across sequence and token tasks and applies PoolingParams (dimensions, normalize, use_activation) consistently.

  • Adds ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
  • Centralizes head setup in seqwise/poolers.py and tokwise/poolers.py via get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic and simplifies constructors to take ModelConfig/VllmConfig
  • Guards dtype casting by head_dtype and enforces Matryoshka dimensions slicing; exports ActivationFn in pooler/common.py

Written by Cursor Bugbot for commit 21e36dc. This will update automatically on new commits. Configure here.


Note

Unifies pooling head design and applies PoolingParams consistently across sequence and token tasks.

  • Adds ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
  • Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Applies per-request normalize/use_activation and dimensions slicing; casts gated by head_dtype
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic and simplifies constructors
  • Expands exports in pooler/common.py and minor type/return cleanups

Written by Cursor Bugbot for commit 0f7f555. This will update automatically on new commits. Configure here.


Note

Unifies pooling head design and makes PoolingParams take effect consistently for sequence and token tasks.

  • Adds ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
  • Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py using _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Applies per-request dimensions slicing and conditional normalize/use_activation; dtype casting only when head_dtype is set
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; remove inline head logic and simplify constructors to ModelConfig/VllmConfig
  • Expands exports in pooler/common.py and tightens types/returns

Written by Cursor Bugbot for commit 08d419d. This will update automatically on new commits. Configure here.


Note

Standardizes pooling behavior and makes PoolingParams (dimensions, normalize, use_activation) take effect consistently across sequence and token tasks.

  • Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; gates dtype cast by head_dtype
  • Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic and simplifies constructors to ModelConfig/VllmConfig
  • Exports ActivationFn in pooler/common.py and tightens types/returns

Written by Cursor Bugbot for commit d27da1d. This will update automatically on new commits. Configure here.


Note

Unifies pooling head design and enforces consistent PoolingParams handling across seq/token tasks.

  • Adds ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; conditionally applies dimensions slicing and per-request normalize/use_activation
  • Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn; dtype casting gated by head_dtype
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic, simplifies constructors to ModelConfig/VllmConfig, and adjusts weight-loading where necessary

Written by Cursor Bugbot for commit da62346. This will update automatically on new commits. Configure here.


Note

Standardizes pooling behavior and makes PoolingParams (dimensions, normalize, use_activation) take effect consistently.

  • Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; dtype cast only when head_dtype is set
  • Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py via _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead with lambda projectors/activations; removes inline head logic and simplifies constructors
  • Ensures matryoshka dimensions slicing and per-request activation/normalization for both sequence and token paths
  • Expands exports in pooler/common.py and tightens type hints/returns

Written by Cursor Bugbot for commit 1d98ecd. This will update automatically on new commits. Configure here.


Note

Standardizes pooling behavior and ensures PoolingParams (dimensions, normalize, use_activation) are honored consistently for sequence and token tasks.

  • Introduces ActivationFn; expands exports in pooler/common.py
  • Refactors EmbeddingPoolerHead/ClassifierPoolerHead and token variants to accept projector/classifier, head_dtype, activation, and optional logit_bias; conditionally cast dtype, slice matryoshka dimensions, and apply normalization/activation per request
  • Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
  • Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead (via lambda projectors/activations); constructors simplified to take ModelConfig/VllmConfig and inline head logic removed

Written by Cursor Bugbot for commit 95e88bd. This will update automatically on new commits. Configure here.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 requested a review from noooop as a code owner January 12, 2026 05:58
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 12, 2026
@noooop noooop enabled auto-merge (squash) January 12, 2026 06:00
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a nice refactoring that standardizes the pooling heads. By moving the configuration logic out of the head classes and into factory functions, you've made the heads more modular, reusable, and easier to test. The changes are applied consistently across bert, gritlm, and modernbert models. I've found one critical issue where a variable could be used before assignment, which would cause a runtime error.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@noooop noooop merged commit 8863c2b into vllm-project:main Jan 12, 2026
57 checks passed
@DarkLight1337 DarkLight1337 deleted the standardize-head branch January 12, 2026 17:10
TomerBN-Nvidia pushed a commit to TomerBN-Nvidia/vllm that referenced this pull request Jan 13, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
wangxiyuan pushed a commit to vllm-project/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
aipaes pushed a commit to aipaes/vllm-ascend that referenced this pull request Jan 15, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
sammysun0711 pushed a commit to sammysun0711/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241 pushed a commit to maoxx241/vllm-ascend that referenced this pull request Mar 2, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ pushed a commit to LCAIZJ/vllm-ascend that referenced this pull request Mar 7, 2026
### What this PR does / why we need it?
Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9)

- Modify import paths due to the refactors
vllm-project/vllm#31916
vllm-project/vllm#32054

- Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional
arguments but 3 were given` due to
vllm-project/vllm#24498

- Skip the async-scheduling tests in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never
verified
vllm-project/vllm#31998

- Skip some pooling tests, which are caused by
vllm-project/vllm#32148
where vllm is also failed
https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4

We will reopen those tests when main2main reachs
vllm-project/vllm#32243

- Skip some cases in
`tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are
broken by
vllm-project/vllm#32118

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: v0.13.0
- vLLM main:
vllm-project/vllm@2f4e654

Signed-off-by: wjunLu <wjunlu217@gmail.com>
Signed-off-by: hfadzxy <starmoon_zhang@163.com>
Co-authored-by: hfadzxy <starmoon_zhang@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants