[Model] Standardize pooling heads by DarkLight1337 · Pull Request #32148 · vllm-project/vllm

DarkLight1337 · 2026-01-12T05:58:23Z

Purpose

Follow-up to #32119, make pooling params actually take effect for custom models.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

^{Cursor Bugbot is generating a summary for commit 610deac. Configure here.}

Note

Unifies pooling head construction and applies pooling params consistently across models.

Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead to accept projector, head_dtype, and activation; conditionally apply normalize/use_activation and dimensions slicing
Centralizes head instantiation in seqwise/tokwise poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BERT, BERT+RoPE, ModernBERT, and GritLM to use EmbeddingPoolerHead-based poolers; replace custom inline heads with configured projector+activation; simplify constructors to take VllmConfig
Ensures classifier heads support optional logit_bias and proper dtype; token heads mirror the same behavior
Expands __all__ and minor API cleanups (type hints, return values)

^{Written by Cursor Bugbot for commit 610deac. This will update automatically on new commits. Configure here.}

Note

Standardizes pooling heads and consistently applies dimensions, normalize, and classifier use_activation across models.

Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token-wise variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; constructors now take VllmConfig and remove inline head logic
Expands __all__ in pooler/common.py; minor type hints and return cleanups
Removes SupportsPP from several MTP model classes without behavior changes

^{Written by Cursor Bugbot for commit e25bf36c6ebe2a3c3e1d772975f9658419870204. This will update automatically on new commits. Configure here.}

Note

Unifies pooling head design and ensures PoolingParams (e.g., normalize, use_activation, dimensions) are applied consistently for sequence and token tasks.

Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead to accept projector, head_dtype, activation, and optional logit_bias
Centralizes head setup in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BERT, BERT+RoPE, ModernBERT, and GritLM to use EmbeddingPoolerHead-based SequencePooler; constructors now take VllmConfig and replace custom inline heads
Applies activation conditionally per-request for both seq and token heads; enforces dtype casting and Matryoshka dimensions slicing
Expands exports in pooler/common.py and simplifies type hints/returns

^{Written by Cursor Bugbot for commit 610deac. This will update automatically on new commits. Configure here.}

Note

Streamlines pooling across sequence and token tasks and ensures PoolingParams are honored consistently.

Adds ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; conditionally applies normalize/use_activation and dimensions slicing
Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to build on EmbeddingPoolerHead; constructors now take VllmConfig and remove inline head logic
Expands __all__ in pooler/common.py and simplifies types/returns; no behavior outside pooling paths is changed

^{Written by Cursor Bugbot for commit 630efb0. This will update automatically on new commits. Configure here.}

Note

Unifies pooling head design and ensures PoolingParams are honored consistently for sequence and token tasks.

Introduces ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias, with conditional normalize/use_activation and dimensions slicing
Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; constructors now take VllmConfig and remove inline head logic
Expands exports in pooler/common.py and simplifies types/returns; no changes outside pooling paths

^{Written by Cursor Bugbot for commit b6a60c0. This will update automatically on new commits. Configure here.}

Note

Standardizes pooling heads and enforces consistent application of PoolingParams across sequence and token tasks.

Adds ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
Applies dimensions slicing and conditional normalize/use_activation per request; dtype casting gated by head_dtype
Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; remove inline head logic and take VllmConfig/ModelConfig where appropriate
Expands exports in pooler/common.py and tightens types/returns

^{Written by Cursor Bugbot for commit dd3eca8. This will update automatically on new commits. Configure here.}

Note

Standardizes pooling across sequence and token tasks and applies PoolingParams (dimensions, normalize, use_activation) consistently.

Adds ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
Centralizes head setup in seqwise/poolers.py and tokwise/poolers.py via get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic and simplifies constructors to take ModelConfig/VllmConfig
Guards dtype casting by head_dtype and enforces Matryoshka dimensions slicing; exports ActivationFn in pooler/common.py

^{Written by Cursor Bugbot for commit 21e36dc. This will update automatically on new commits. Configure here.}

Note

Unifies pooling head design and applies PoolingParams consistently across sequence and token tasks.

Adds ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Applies per-request normalize/use_activation and dimensions slicing; casts gated by head_dtype
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic and simplifies constructors
Expands exports in pooler/common.py and minor type/return cleanups

^{Written by Cursor Bugbot for commit 0f7f555. This will update automatically on new commits. Configure here.}

Note

Unifies pooling head design and makes PoolingParams take effect consistently for sequence and token tasks.

Adds ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias
Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py using _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Applies per-request dimensions slicing and conditional normalize/use_activation; dtype casting only when head_dtype is set
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; remove inline head logic and simplify constructors to ModelConfig/VllmConfig
Expands exports in pooler/common.py and tightens types/returns

^{Written by Cursor Bugbot for commit 08d419d. This will update automatically on new commits. Configure here.}

Note

Standardizes pooling behavior and makes PoolingParams (dimensions, normalize, use_activation) take effect consistently across sequence and token tasks.

Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; gates dtype cast by head_dtype
Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic and simplifies constructors to ModelConfig/VllmConfig
Exports ActivationFn in pooler/common.py and tightens types/returns

^{Written by Cursor Bugbot for commit d27da1d. This will update automatically on new commits. Configure here.}

Note

Unifies pooling head design and enforces consistent PoolingParams handling across seq/token tasks.

Adds ActivationFn; refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; conditionally applies dimensions slicing and per-request normalize/use_activation
Centralizes head instantiation in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn; dtype casting gated by head_dtype
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead; removes inline head logic, simplifies constructors to ModelConfig/VllmConfig, and adjusts weight-loading where necessary

^{Written by Cursor Bugbot for commit da62346. This will update automatically on new commits. Configure here.}

Note

Standardizes pooling behavior and makes PoolingParams (dimensions, normalize, use_activation) take effect consistently.

Introduces ActivationFn and refactors EmbeddingPoolerHead/ClassifierPoolerHead (and token variants) to accept projector/classifier, head_dtype, activation, and optional logit_bias; dtype cast only when head_dtype is set
Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py via _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead with lambda projectors/activations; removes inline head logic and simplifies constructors
Ensures matryoshka dimensions slicing and per-request activation/normalization for both sequence and token paths
Expands exports in pooler/common.py and tightens type hints/returns

^{Written by Cursor Bugbot for commit 1d98ecd. This will update automatically on new commits. Configure here.}

Note

Standardizes pooling behavior and ensures PoolingParams (dimensions, normalize, use_activation) are honored consistently for sequence and token tasks.

Introduces ActivationFn; expands exports in pooler/common.py
Refactors EmbeddingPoolerHead/ClassifierPoolerHead and token variants to accept projector/classifier, head_dtype, activation, and optional logit_bias; conditionally cast dtype, slice matryoshka dimensions, and apply normalization/activation per request
Centralizes head construction in seqwise/poolers.py and tokwise/poolers.py using get_current_vllm_config, _load_st_projector, PoolerNormalize, and resolve_classifier_act_fn
Updates BertPooler, BertWithRope, ModernBertPooler, and GritLMPooler to use EmbeddingPoolerHead (via lambda projectors/activations); constructors simplified to take ModelConfig/VllmConfig and inline head logic removed

^{Written by Cursor Bugbot for commit 95e88bd. This will update automatically on new commits. Configure here.}