[Core] Deprecate `xformers` by ywang96 · Pull Request #29262 · vllm-project/vllm

ywang96 · 2025-11-23T08:36:37Z

Purpose

Reopened from #28287

This PR completely removes the dependency of xformers library and should be only merged after v0.11.1 release. The rationale behind removing xformers is that:

xformers is used for multimodal attention (MHA) but we can have alternative attention backends to replace it
We have xformers attention backend for decoder LM, but it's no longer used for anything
Having another external dependency puts extra risks on our release - a hard lesson we learned from working on upgrading pytorch 2.9.
[Attention] FA2 support more head sizes, ViT support, make default backend #28763 added FA support for head sizes that we previously did not support, which make xformers no longer necessary.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roger Wang <hey@rogerw.io>

mergify · 2025-11-23T08:37:11Z

Documentation preview: https://vllm--29262.org.readthedocs.build/en/29262/

gemini-code-assist

Code Review

This pull request effectively deprecates the xformers dependency. The changes are comprehensive, removing xformers from requirements, Dockerfiles, documentation, and tests. The core logic is updated to remove the xformers attention backend, with TORCH_SDPA being used as a fallback in some cases, such as in the keye model. The pixtral model, which still relies on xformers, has been updated with a comment to clarify that xformers is now an optional dependency for that specific model. The changes are clean and well-aligned with the goal of deprecating xformers.

ywang96 · 2025-11-23T08:44:59Z

@codex review

ywang96 · 2025-11-23T08:45:25Z

Turning on CI to make sure there's no regression.

chatgpt-codex-connector · 2025-11-23T08:50:22Z

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Roger Wang <hey@rogerw.io>

DarkLight1337

LGTM if tests pass

Signed-off-by: Roger Wang <hey@rogerw.io>

Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

Signed-off-by: Roger Wang <hey@rogerw.io>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: tanqingshan (A) <50050625@china.huawei.com>

This is not needed anymore after vllm-project/vllm#29262. So, we only need to build vLLM wheel on PyTorch CI now. 1. vllm-project/vllm#29262 has removed xformers from vLLM, and this PR applies that change to our end 2. This vLLM nightly wheel will be used to power nightly benchmark runs on PyTorch CI. It's good that we just need to rebuild vLLM now and none of its dependencies 3. I'm trying to get rid of the Dockerfile on PyTorch eventually, and just use the official one from vLLM instead. This is a work in progress Pull Request resolved: #169914 Approved by: https://github.com/zou3519

This is not needed anymore after vllm-project/vllm#29262. So, we only need to build vLLM wheel on PyTorch CI now. 1. vllm-project/vllm#29262 has removed xformers from vLLM, and this PR applies that change to our end 2. This vLLM nightly wheel will be used to power nightly benchmark runs on PyTorch CI. It's good that we just need to rebuild vLLM now and none of its dependencies 3. I'm trying to get rid of the Dockerfile on PyTorch eventually, and just use the official one from vLLM instead. This is a work in progress Pull Request resolved: pytorch#169914 Approved by: https://github.com/zou3519

1. fix vllm-project/vllm#28542 The model structure modifications we involved in are: - Qwen2.5-VL(still exist some patch) - Qwen2-VL - Qwen2 - DeepSeek series - Qwen-moe series 2. fix vllm-project/vllm#29121 the output token now type changed from np to `list[list[int]]` 3. fix vllm-project/vllm#29262 `xformers` backend for multimodal now has been deprecated 4. fix vllm-project/vllm#29342 5. fix vllm-project/vllm#28579 6. fix vllm-project/vllm#28718 7. fix vllm-project/vllm#28665 8. fix vllm-project/vllm#26847 vllm introduced the `optimization-level`, some default config has been changed, and the param `--enforce-eager` has been deprecated 9. fix https://github.com/vllm-project/vllm/pull/29223 it retuns tuple for sampler. 10. fix vllm-project/vllm#29471 we'll remove the related patch to avoid this kind of error. Co-authored-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: hfadzxy <starmoon_zhang@163.com> Co-authored-by: wangli <wangli858794774@gmail.com> Co-authored-by: hfadzxy <starmoon_zhang@163.com>

This is not needed anymore after vllm-project/vllm#29262. So, we only need to build vLLM wheel on PyTorch CI now. 1. vllm-project/vllm#29262 has removed xformers from vLLM, and this PR applies that change to our end 2. This vLLM nightly wheel will be used to power nightly benchmark runs on PyTorch CI. It's good that we just need to rebuild vLLM now and none of its dependencies 3. I'm trying to get rid of the Dockerfile on PyTorch eventually, and just use the official one from vLLM instead. This is a work in progress Pull Request resolved: pytorch#169914 Approved by: https://github.com/zou3519

2. Remove deprecated xformers (vllm-project#29262) 3. Updated _get_prompt_updates() Signed-off-by: Oscar Gonzalez <ogonzal6@alumni.jh.edu>

remove

c80d28e

Signed-off-by: Roger Wang <hey@rogerw.io>

ywang96 requested review from LucasWilkinson, WoosukKwon, jeejeelee, mgoin, patrickvonplaten, sighingnow, tlrmchlsmth and yewentao256 as code owners November 23, 2025 08:36

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 23, 2025

mergify bot added documentation Improvements or additions to documentation ci/build qwen Related to Qwen models nvidia labels Nov 23, 2025

github-project-automation bot added this to NVIDIA Nov 23, 2025

mergify bot added the v1 label Nov 23, 2025

gemini-code-assist bot reviewed Nov 23, 2025

View reviewed changes

add error message

f7caefb

Signed-off-by: Roger Wang <hey@rogerw.io>

ywang96 requested review from ProExpertProg, hmellor, houseroad, robertgshaw2-redhat and youkaichao as code owners November 23, 2025 08:57

refine

eb1e2af

Signed-off-by: Roger Wang <hey@rogerw.io>

DarkLight1337 approved these changes Nov 23, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 23, 2025

typo

a52aaf7

Signed-off-by: Roger Wang <hey@rogerw.io>

DarkLight1337 merged commit 0ff7082 into vllm-project:main Nov 24, 2025
90 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 24, 2025

lpapavassiliou pushed a commit to lpapavassiliou/vllm that referenced this pull request Nov 24, 2025

[Core] Deprecate xformers (vllm-project#29262)

8916b0b

Signed-off-by: Roger Wang <hey@rogerw.io>

RunkaiTao pushed a commit to RunkaiTao/vllm that referenced this pull request Nov 24, 2025

[Core] Deprecate xformers (vllm-project#29262)

92756cb

Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Core] Deprecate xformers (vllm-project#29262)

54757a0

Signed-off-by: Roger Wang <hey@rogerw.io>

Potabk mentioned this pull request Dec 1, 2025

[Main] Upgrade vllm commit to 2025_12_01 vllm-project/vllm-ascend#4527

Closed

wangxiyuan mentioned this pull request Dec 1, 2025

upgrade vLLM to main vllm-project/vllm-ascend#4608

Merged

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

[Core] Deprecate xformers (vllm-project#29262)

a46c7e8

Signed-off-by: Roger Wang <hey@rogerw.io>

huydhn mentioned this pull request Dec 9, 2025

[vLLM] Remove xformers in vLLM build workflow pytorch/pytorch#169914

Closed

oscardev256 added a commit to oscardev256/vllm that referenced this pull request Dec 25, 2025

1. Remove upstream fa checks (vllm-project#29471)

535acbd

2. Remove deprecated xformers (vllm-project#29262) 3. Updated _get_prompt_updates() Signed-off-by: Oscar Gonzalez <ogonzal6@alumni.jh.edu>

Isotr0py mentioned this pull request Dec 28, 2025

[v1] Add encoder-only/cross attention support to Triton Attention backend #31406

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core] Deprecate `xformers`#29262

[Core] Deprecate `xformers`#29262
DarkLight1337 merged 5 commits intovllm-project:mainfrom
ywang96:remove-xformers-2

ywang96 commented Nov 23, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ywang96 commented Nov 23, 2025

Uh oh!

ywang96 commented Nov 23, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 23, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

ywang96 commented Nov 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Nov 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ywang96 commented Nov 23, 2025

Uh oh!

ywang96 commented Nov 23, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 23, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ywang96 commented Nov 23, 2025 •

edited by github-actions bot

Loading