Skip to content

always use embed&token_classify for bge-m3#37632

Merged
noooop merged 3 commits intovllm-project:mainfrom
staugust:disbale_multi_task_for_bge_m3_plugin
Mar 23, 2026
Merged

always use embed&token_classify for bge-m3#37632
noooop merged 3 commits intovllm-project:mainfrom
staugust:disbale_multi_task_for_bge_m3_plugin

Conversation

@staugust
Copy link
Copy Markdown
Contributor

@staugust staugust commented Mar 20, 2026

Purpose

As discussed in #35829 (comment) , model serving instance only serves one pooling task set at starting point. Hence, for bge-m3 with dense, sparse, dense&sparse mode, this plugin use embed&token_classify for all pooling requests, and return dense, sparse, dense&sparse according to embed_task configured in the request.

This is part of deprecate support for multi-task.

Test Plan

plugins_tests/test_bge_m3_sparse_io_processor_plugins.py

Test Result

All tests are passed.
image


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the bge-m3 sparse embedding processor to consistently use the embed&token_classify task for all pooling requests. This simplifies the logic in merge_pooling_params by removing conditional task assignments. Correspondingly, the post_process method has been updated to always calculate embed_dimensions, ensuring correct slicing of the model's output to extract dense and/or sparse embeddings as requested. The changes align with the stated purpose of supporting a fixed model serving configuration and appear to be implemented correctly.

@staugust
Copy link
Copy Markdown
Contributor Author

@noooop PTAL, thanks.

@noooop noooop requested a review from DarkLight1337 March 20, 2026 02:48
@DarkLight1337 DarkLight1337 requested a review from noooop March 20, 2026 03:02
@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Mar 20, 2026

If the user only wants to use one type of embeddings, is it necessary for them to use the plugin? If it's necessary, then this would result in some unnecessary tensor transfer since you always return both dense and sparse embeddings.

@staugust
Copy link
Copy Markdown
Contributor Author

@DarkLight1337 Not only unnecessary tensor transfer, but also unnecessary gpu computation for both dense&sparse if user only want dense or sparse embeddings. v2 runner only support one pooling task at starting point. Thus, for dense embedding, user shall call v1/embeddings for dense embedding with another vllm instance where pooling task set to embed. And for sparse embedding, call /token_classify with a vllm instance where pooling task set to token_classify. For dense&sparse, vllm instance must set pooling task to embed&token_classify, I do not think those unnecessary tensor transfer and gpu computation can be avoid by this plugin, it's all inside vllm core engine.

@noooop
Copy link
Copy Markdown
Collaborator

noooop commented Mar 20, 2026

In previous versions, users could request dense, sparse, or dense & sparse. Now, suppose only the ‘dense & sparse pooler’ is used to implement these features. I assume there should be some filtering code, but I couldn’t find it. Could you please explain how these features are currently implemented?

Copy link
Copy Markdown
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@noooop noooop enabled auto-merge (squash) March 23, 2026 02:31
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 23, 2026
@noooop noooop merged commit 6e04e79 into vllm-project:main Mar 23, 2026
13 checks passed
yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Mar 23, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
SouthWest7 pushed a commit to SouthWest7/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
nithinvc pushed a commit to nithinvc/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>

Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants