always use `embed&token_classify` for bge-m3 by staugust · Pull Request #37632 · vllm-project/vllm

staugust · 2026-03-20T02:38:02Z

Purpose

As discussed in #35829 (comment) , model serving instance only serves one pooling task set at starting point. Hence, for bge-m3 with dense, sparse, dense&sparse mode, this plugin use embed&token_classify for all pooling requests, and return dense, sparse, dense&sparse according to embed_task configured in the request.

This is part of deprecate support for multi-task.

Test Plan

plugins_tests/test_bge_m3_sparse_io_processor_plugins.py

Test Result

All tests are passed.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

gemini-code-assist

Code Review

This pull request refactors the bge-m3 sparse embedding processor to consistently use the embed&token_classify task for all pooling requests. This simplifies the logic in merge_pooling_params by removing conditional task assignments. Correspondingly, the post_process method has been updated to always calculate embed_dimensions, ensuring correct slicing of the model's output to extract dense and/or sparse embeddings as requested. The changes align with the stated purpose of supporting a fixed model serving configuration and appear to be implemented correctly.

staugust · 2026-03-20T02:41:31Z

@noooop PTAL, thanks.

DarkLight1337 · 2026-03-20T03:04:24Z

If the user only wants to use one type of embeddings, is it necessary for them to use the plugin? If it's necessary, then this would result in some unnecessary tensor transfer since you always return both dense and sparse embeddings.

staugust · 2026-03-20T03:09:44Z

@DarkLight1337 Not only unnecessary tensor transfer, but also unnecessary gpu computation for both dense&sparse if user only want dense or sparse embeddings. v2 runner only support one pooling task at starting point. Thus, for dense embedding, user shall call v1/embeddings for dense embedding with another vllm instance where pooling task set to embed. And for sparse embedding, call /token_classify with a vllm instance where pooling task set to token_classify. For dense&sparse, vllm instance must set pooling task to embed&token_classify, I do not think those unnecessary tensor transfer and gpu computation can be avoid by this plugin, it's all inside vllm core engine.

noooop · 2026-03-20T06:53:02Z

In previous versions, users could request dense, sparse, or dense & sparse. Now, suppose only the ‘dense & sparse pooler’ is used to implement these features. I assume there should be some filtering code, but I couldn’t find it. Could you please explain how these features are currently implemented?

noooop

Thanks!

tests/plugins/bge_m3_sparse_plugin/bge_m3_sparse_processor/sparse_embeddings_processor.py

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

always use embed&token_classify for bge-m3

81269ee

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

noooop requested a review from DarkLight1337 March 20, 2026 02:48

DarkLight1337 requested a review from noooop March 20, 2026 03:02

Merge branch 'main' into disbale_multi_task_for_bge_m3_plugin

57309e1

noooop approved these changes Mar 23, 2026

View reviewed changes

noooop enabled auto-merge (squash) March 23, 2026 02:31

noooop reviewed Mar 23, 2026

View reviewed changes

tests/plugins/bge_m3_sparse_plugin/bge_m3_sparse_processor/sparse_embeddings_processor.py Show resolved Hide resolved

Merge branch 'main' into disbale_multi_task_for_bge_m3_plugin

f628c84

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026

noooop merged commit 6e04e79 into vllm-project:main Mar 23, 2026
13 checks passed

noooop mentioned this pull request Mar 23, 2026

[Frontend] Remove pooling multi task support. (Hold off until v0.20.0) #37861

Draft

5 tasks

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Mar 23, 2026

always use embed&token_classify for bge-m3 (vllm-project#37632)

ffed987

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

noooop mentioned this pull request Mar 24, 2026

[Deprecate] Deprecate pooling multi task support. #37956

Merged

5 tasks

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

always use embed&token_classify for bge-m3 (vllm-project#37632)

0422bd4

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026

always use embed&token_classify for bge-m3 (vllm-project#37632)

476f0b5

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026

always use embed&token_classify for bge-m3 (vllm-project#37632)

9f11e3c

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

always use embed&token_classify for bge-m3 (vllm-project#37632)

1071e0f

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

always use embed&token_classify for bge-m3 (vllm-project#37632)

df2df32

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

always use `embed&token_classify` for bge-m3#37632

always use `embed&token_classify` for bge-m3#37632
noooop merged 3 commits intovllm-project:mainfrom
staugust:disbale_multi_task_for_bge_m3_plugin

staugust commented Mar 20, 2026 •

edited by noooop

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

staugust commented Mar 20, 2026

Uh oh!

DarkLight1337 commented Mar 20, 2026 •

edited

Loading

Uh oh!

staugust commented Mar 20, 2026

Uh oh!

noooop commented Mar 20, 2026 •

edited

Loading

Uh oh!

noooop left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

staugust commented Mar 20, 2026 • edited by noooop Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

staugust commented Mar 20, 2026

Uh oh!

DarkLight1337 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

staugust commented Mar 20, 2026

Uh oh!

noooop commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

staugust commented Mar 20, 2026 •

edited by noooop

Loading

DarkLight1337 commented Mar 20, 2026 •

edited

Loading

noooop commented Mar 20, 2026 •

edited

Loading