[Model] Standardize pooling heads#32148
Merged
noooop merged 13 commits intovllm-project:mainfrom Jan 12, 2026
Merged
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
noooop
approved these changes
Jan 12, 2026
Contributor
There was a problem hiding this comment.
Code Review
This pull request is a nice refactoring that standardizes the pooling heads. By moving the configuration logic out of the head classes and into factory functions, you've made the heads more modular, reusable, and easier to test. The changes are applied consistently across bert, gritlm, and modernbert models. I've found one critical issue where a variable could be used before assignment, which would cause a runtime error.
e25bf36 to
610deac
Compare
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
5 tasks
TomerBN-Nvidia
pushed a commit
to TomerBN-Nvidia/vllm
that referenced
this pull request
Jan 13, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
wangxiyuan
pushed a commit
to vllm-project/vllm-ascend
that referenced
this pull request
Jan 15, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
aipaes
pushed a commit
to aipaes/vllm-ascend
that referenced
this pull request
Jan 15, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
sammysun0711
pushed a commit
to sammysun0711/vllm
that referenced
this pull request
Jan 16, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
akh64bit
pushed a commit
to akh64bit/vllm
that referenced
this pull request
Jan 16, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
dsuhinin
pushed a commit
to dsuhinin/vllm
that referenced
this pull request
Jan 21, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
starmountain1997
pushed a commit
to starmountain1997/vllm-ascend
that referenced
this pull request
Jan 31, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
starmountain1997
pushed a commit
to starmountain1997/vllm-ascend
that referenced
this pull request
Jan 31, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ItzDEXX
pushed a commit
to ItzDEXX/vllm
that referenced
this pull request
Feb 19, 2026
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Feb 28, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
maoxx241
pushed a commit
to maoxx241/vllm-ascend
that referenced
this pull request
Mar 2, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
ZRJ026
pushed a commit
to ZRJ026/vllm-ascend
that referenced
this pull request
Mar 4, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
LCAIZJ
pushed a commit
to LCAIZJ/vllm-ascend
that referenced
this pull request
Mar 7, 2026
### What this PR does / why we need it? Upgrade vllm commit to 0113 (11b6af5280d6d6dfb8953af16e67b25f819b3be9) - Modify import paths due to the refactors vllm-project/vllm#31916 vllm-project/vllm#32054 - Fix `TypeError: NPUOffloadingSpec.__init__() takes 2 positional arguments but 3 were given` due to vllm-project/vllm#24498 - Skip the async-scheduling tests in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are never verified vllm-project/vllm#31998 - Skip some pooling tests, which are caused by vllm-project/vllm#32148 where vllm is also failed https://buildkite.com/vllm/ci/builds/46705/steps/canvas?jid=019bb329-3834-4685-862b-1613b8e0f5d4 We will reopen those tests when main2main reachs vllm-project/vllm#32243 - Skip some cases in `tests/e2e/multicard/4-cards/long_sequence/test_mtp.py`, which are broken by vllm-project/vllm#32118 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@2f4e654 Signed-off-by: wjunLu <wjunlu217@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Follow-up to #32119, make pooling params actually take effect for custom models.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Note
Cursor Bugbot is generating a summary for commit 610deac. Configure here.
Note
Unifies pooling head construction and applies pooling params consistently across models.
ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHeadto acceptprojector,head_dtype, andactivation; conditionally applynormalize/use_activationanddimensionsslicingpoolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnEmbeddingPoolerHead-based poolers; replace custom inline heads with configured projector+activation; simplify constructors to takeVllmConfiglogit_biasand proper dtype; token heads mirror the same behavior__all__and minor API cleanups (type hints, return values)Written by Cursor Bugbot for commit 610deac. This will update automatically on new commits. Configure here.
Note
Standardizes pooling heads and consistently applies
dimensions,normalize, and classifieruse_activationacross models.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token-wise variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_biasseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; constructors now takeVllmConfigand remove inline head logic__all__inpooler/common.py; minor type hints and return cleanupsSupportsPPfrom several MTP model classes without behavior changesWritten by Cursor Bugbot for commit e25bf36c6ebe2a3c3e1d772975f9658419870204. This will update automatically on new commits. Configure here.
Note
Unifies pooling head design and ensures
PoolingParams(e.g.,normalize,use_activation,dimensions) are applied consistently for sequence and token tasks.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHeadto acceptprojector,head_dtype,activation, and optionallogit_biasseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBERT,BERT+RoPE,ModernBERT, andGritLMto useEmbeddingPoolerHead-basedSequencePooler; constructors now takeVllmConfigand replace custom inline headsdimensionsslicingpooler/common.pyand simplifies type hints/returnsWritten by Cursor Bugbot for commit 610deac. This will update automatically on new commits. Configure here.
Note
Streamlines pooling across sequence and token tasks and ensures
PoolingParamsare honored consistently.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_bias; conditionally appliesnormalize/use_activationanddimensionsslicingseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto build onEmbeddingPoolerHead; constructors now takeVllmConfigand remove inline head logic__all__inpooler/common.pyand simplifies types/returns; no behavior outside pooling paths is changedWritten by Cursor Bugbot for commit 630efb0. This will update automatically on new commits. Configure here.
Note
Unifies pooling head design and ensures
PoolingParamsare honored consistently for sequence and token tasks.ActivationFn; refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_bias, with conditionalnormalize/use_activationanddimensionsslicingseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; constructors now takeVllmConfigand remove inline head logicpooler/common.pyand simplifies types/returns; no changes outside pooling pathsWritten by Cursor Bugbot for commit b6a60c0. This will update automatically on new commits. Configure here.
Note
Standardizes pooling heads and enforces consistent application of
PoolingParamsacross sequence and token tasks.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_biasdimensionsslicing and conditionalnormalize/use_activationper request; dtype casting gated byhead_dtypeseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; remove inline head logic and takeVllmConfig/ModelConfigwhere appropriatepooler/common.pyand tightens types/returnsWritten by Cursor Bugbot for commit dd3eca8. This will update automatically on new commits. Configure here.
Note
Standardizes pooling across sequence and token tasks and applies
PoolingParams(dimensions,normalize,use_activation) consistently.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_biasseqwise/poolers.pyandtokwise/poolers.pyviaget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; removes inline head logic and simplifies constructors to takeModelConfig/VllmConfighead_dtypeand enforces Matryoshkadimensionsslicing; exportsActivationFninpooler/common.pyWritten by Cursor Bugbot for commit 21e36dc. This will update automatically on new commits. Configure here.
Note
Unifies pooling head design and applies
PoolingParamsconsistently across sequence and token tasks.ActivationFn; refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_biasseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnnormalize/use_activationanddimensionsslicing; casts gated byhead_dtypeBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; removes inline head logic and simplifies constructorspooler/common.pyand minor type/return cleanupsWritten by Cursor Bugbot for commit 0f7f555. This will update automatically on new commits. Configure here.
Note
Unifies pooling head design and makes
PoolingParamstake effect consistently for sequence and token tasks.ActivationFn; refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_biasseqwise/poolers.pyandtokwise/poolers.pyusing_load_st_projector,PoolerNormalize, andresolve_classifier_act_fndimensionsslicing and conditionalnormalize/use_activation; dtype casting only whenhead_dtypeis setBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; remove inline head logic and simplify constructors toModelConfig/VllmConfigpooler/common.pyand tightens types/returnsWritten by Cursor Bugbot for commit 08d419d. This will update automatically on new commits. Configure here.
Note
Standardizes pooling behavior and makes
PoolingParams(dimensions,normalize,use_activation) take effect consistently across sequence and token tasks.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_bias; gates dtype cast byhead_dtypeseqwise/poolers.pyandtokwise/poolers.pyusing_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; removes inline head logic and simplifies constructors toModelConfig/VllmConfigActivationFninpooler/common.pyand tightens types/returnsWritten by Cursor Bugbot for commit d27da1d. This will update automatically on new commits. Configure here.
Note
Unifies pooling head design and enforces consistent
PoolingParamshandling across seq/token tasks.ActivationFn; refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_bias; conditionally appliesdimensionsslicing and per-requestnormalize/use_activationseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fn; dtype casting gated byhead_dtypeBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead; removes inline head logic, simplifies constructors toModelConfig/VllmConfig, and adjusts weight-loading where necessaryWritten by Cursor Bugbot for commit da62346. This will update automatically on new commits. Configure here.
Note
Standardizes pooling behavior and makes
PoolingParams(dimensions,normalize,use_activation) take effect consistently.ActivationFnand refactorsEmbeddingPoolerHead/ClassifierPoolerHead(and token variants) to acceptprojector/classifier,head_dtype,activation, and optionallogit_bias; dtype cast only whenhead_dtypeis setseqwise/poolers.pyandtokwise/poolers.pyvia_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHeadwith lambda projectors/activations; removes inline head logic and simplifies constructorsdimensionsslicing and per-request activation/normalization for both sequence and token pathspooler/common.pyand tightens type hints/returnsWritten by Cursor Bugbot for commit 1d98ecd. This will update automatically on new commits. Configure here.
Note
Standardizes pooling behavior and ensures
PoolingParams(dimensions,normalize,use_activation) are honored consistently for sequence and token tasks.ActivationFn; expands exports inpooler/common.pyEmbeddingPoolerHead/ClassifierPoolerHeadand token variants to acceptprojector/classifier,head_dtype,activation, and optionallogit_bias; conditionally cast dtype, slice matryoshkadimensions, and apply normalization/activation per requestseqwise/poolers.pyandtokwise/poolers.pyusingget_current_vllm_config,_load_st_projector,PoolerNormalize, andresolve_classifier_act_fnBertPooler,BertWithRope,ModernBertPooler, andGritLMPoolerto useEmbeddingPoolerHead(via lambda projectors/activations); constructors simplified to takeModelConfig/VllmConfigand inline head logic removedWritten by Cursor Bugbot for commit 95e88bd. This will update automatically on new commits. Configure here.