[Platform] Make forward context manager pluggable for other device by shen-shanshan · Pull Request #29388 · vllm-project/vllm

shen-shanshan · 2025-11-25T09:02:07Z

Purpose

Currently, set_forward_context is directly used in some modeling files, such as Qwen2.5-VL (when process image/video input with ViT).

We want to use some custom forward context here (without modifying the modeling files in the plugin), such as https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/ascend_forward_context.py#L58.

Thus, this PR has made forward context manager pluggable for other device.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: shen-shanshan <467638484@qq.com>

gemini-code-assist

Code Review

This pull request refactors the usage of set_forward_context to be pluggable, allowing different platforms to provide their own forward context manager. This is achieved by introducing a get_forward_context_manager method in the Platform interface. The implementation is sound and improves modularity. My only feedback is on a minor code duplication issue in qwen2_5_vl.py where current_platform is imported locally in two separate methods. I've suggested a refactoring to improve maintainability.

Signed-off-by: shen-shanshan <467638484@qq.com>

DarkLight1337 · 2025-11-25T11:09:04Z

            self.language_model.make_empty_intermediate_tensors
        )

+        from vllm.platforms import current_platform


I think there is no need to import this lazily, just import it from top level

I think there is no need to import this lazily, just import it from top level

done.

ProExpertProg · 2025-11-25T13:26:52Z

        return max_model_len

+    @classmethod
+    def get_forward_context_manager(cls):


Can you add a type hint here?

Can you add a type hint here?

done.

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan · 2025-11-28T06:18:49Z

After offline discussing with @wangxiyuan , this PR is not needed and we decided to directly refactor our set_ascend_forward_context(). Thanks for reviewing~

…VisionAttention (#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com>

…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com>

…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com>

…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

…VisionAttention (vllm-project#4349) ### What this PR does / why we need it? - [x] Patch `Qwen2_5_VisionAttention` with `AscendQwen2_5_VisionAttention`. - [x] Replace `AscendQwen2_5_VisionTransformer` with `Qwen2_5_VisionTransformer` in vllm. - [x] Move padding logic (q/k/v and cos/sin) before FA to `forward()` of `Qwen2_5_VisionAttention`. - [x] Covert `cu_seqlens` in `Qwen2_5_VisionAttention` from cumulative form to intervals and move it to cpu (compatible for npu FA). - [x] Remove Qwen2.5-VL modeling files. - [x] Remove Qwen2.5-VL (without padding) modeling files. - [x] Remove related UT. - [x] Make `set_forward_context` pluggable when getting MM embedding. Find more details at vllm-project/vllm#29388. - [x] Simplify padding logic for FA. - [x] Add patch for vllm-project/vllm#28798. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - [x] Functional test (eager mode) - [x] Functional test (graph mode) - [x] Benchmark - vLLM version: v0.11.2 --------- Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan added 2 commits November 25, 2025 08:51

make forward context manager pluggable

27780a1

Signed-off-by: shen-shanshan <467638484@qq.com>

update

f4fe99b

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan requested a review from sighingnow as a code owner November 25, 2025 09:02

mergify bot added the qwen Related to Qwen models label Nov 25, 2025

shen-shanshan mentioned this pull request Nov 25, 2025

[RFC]: Remove VL Modeling Files vllm-project/vllm-ascend#4084

Closed

17 tasks

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

Comment thread vllm/model_executor/models/qwen2_5_vl.py Outdated

minor fix

69f01ee

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan mentioned this pull request Nov 25, 2025

[MM][Model][Perf] Remove Qwen2.5-VL modeling files and add patch for VisionAttention vllm-project/vllm-ascend#4349

Merged

13 tasks

DarkLight1337 reviewed Nov 25, 2025

View reviewed changes

ProExpertProg reviewed Nov 25, 2025

View reviewed changes

minor fix

aa2947c

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan closed this Nov 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Platform] Make forward context manager pluggable for other device#29388

[Platform] Make forward context manager pluggable for other device#29388
shen-shanshan wants to merge 4 commits intovllm-project:mainfrom
shen-shanshan:platform

shen-shanshan commented Nov 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 Nov 25, 2025

Uh oh!

shen-shanshan Nov 26, 2025

Uh oh!

ProExpertProg Nov 25, 2025

Uh oh!

shen-shanshan Nov 26, 2025

Uh oh!

shen-shanshan commented Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

shen-shanshan commented Nov 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

ProExpertProg Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

shen-shanshan commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shen-shanshan commented Nov 25, 2025 •

edited by github-actions bot

Loading

shen-shanshan commented Nov 28, 2025 •

edited

Loading