[Version] Drop 0.16.0 support by MengqingCao · Pull Request #7153 · vllm-project/vllm-ascend

MengqingCao · 2026-03-11T06:25:32Z

What this PR does / why we need it?

Drop 0.16.0 support in main

Fix eagle proposer break introduced by [BugFix] Add support for MTP num_speculative_tokens > 1 with sparse MLA vllm#34552. Mainly change to use the draft attention group to initialize the attention metadata builder.
Fix the ModelRunner has no attribute cudagraph_capture_sizes error, which is a bug in vLLM v0.17.0, and fixed by a later pr [UX][Startup] Account for CUDA graphs during memory profiling vllm#30515

How was this patch tested?

CI passed with existing test.

vLLM version: v0.16.0
vLLM main: vllm-project/vllm@4034c3d

gemini-code-assist · 2026-03-11T06:25:52Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on streamlining the project's dependency on vLLM by completely dropping support for version 0.16.0 and standardizing on version 0.17.0. This change simplifies the codebase by removing version-specific conditional logic, ensuring that all components are built and run against the latest supported vLLM release. The update also includes minor adjustments to installation instructions, reflecting a more current setup for the Ascend environment.

Highlights

vLLM Version Upgrade: The vLLM dependency has been upgraded from version v0.16.0 to v0.17.0 across all Dockerfiles and CI configuration.
Removal of Version-Specific Logic: Conditional code blocks that checked for vllm_version_is("0.16.0") have been removed, simplifying the codebase by making previously conditional logic unconditional and aligned with the new vLLM version.
Documentation Updates: Installation documentation has been updated to reflect new environment variable source paths, likely related to Ascend toolkit installations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

Dockerfile
- Updated VLLM_TAG from v0.16.0 to v0.17.0.
Dockerfile.310p
- Updated VLLM_TAG from v0.16.0 to v0.17.0.
Dockerfile.310p.openEuler
- Updated VLLM_TAG from v0.16.0 to v0.17.0.
Dockerfile.a3
- Updated VLLM_TAG from v0.16.0 to v0.17.0.
Dockerfile.a3.openEuler
- Updated VLLM_TAG from v0.16.0 to v0.17.0.
Dockerfile.openEuler
- Updated VLLM_TAG from v0.16.0 to v0.17.0.
docs/source/conf.py
- Updated ci_vllm_version from v0.16.0 to v0.17.0.
docs/source/installation.md
- Updated source paths for Ascend toolkit environment variables.
- Added a new source path for asdsip environment variables.
vllm_ascend/_310p/fused_moe/fused_moe.py
- Removed conditional vllm_version_is("0.16.0") checks around _init_runner method and its call.
vllm_ascend/ops/fused_moe/fused_moe.py
- Removed conditional import for FusedMoEMethodBase, FusedMoERouter, and DefaultMoERunner.
- Removed conditional definition of AscendMoERunner class.
vllm_ascend/patch/worker/init.py
- Removed conditional import of patch_qwen3_5 and moved it to an unconditional import.
vllm_ascend/patch/worker/patch_v2_eagle.py
- Removed conditional imports for prepare_eagle_decode and prepare_eagle_inputs.
- Removed conditional assignment for EagleSpeculator.propose.
vllm_ascend/worker/model_runner_v1.py
- Removed conditional logic based on vllm_version_is("0.16.0") in execute_model method.
- Removed conditional logic based on vllm_version_is("0.16.0") in dispatch_cudagraph method.

Ignored Files

Ignored by pattern: .github/workflows/** (6)
- .github/workflows/_e2e_nightly_multi_node.yaml
- .github/workflows/_e2e_nightly_single_node.yaml
- .github/workflows/pr_test_full.yaml
- .github/workflows/pr_test_light.yaml
- .github/workflows/schedule_nightly_test_a2.yaml
- .github/workflows/schedule_test_benchmarks.yaml

Activity

CI passed with existing tests, indicating no regressions from the version upgrade and code changes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-11T06:26:00Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request effectively removes support for vLLM 0.16.0 by updating version tags in Dockerfiles and documentation, and by removing conditional logic for version 0.16.0 from the codebase. The changes are consistent and align with the goal of dropping the older version support. I have also provided a suggested PR title and summary that aligns with the repository's style guide. I found no critical or high-severity issues in the code changes.

Suggested PR Title:

[Version][Misc] Drop vLLM 0.16.0 support

Suggested PR Summary:

### What this PR does / why we need it?
This pull request removes support for vLLM version 0.16.0 and updates the codebase to align with vLLM version 0.17.0.

Key changes include:
- Updating the `VLLM_TAG` in all `Dockerfile`s to `v0.17.0`.
- Removing conditional code paths that were specific to vLLM `v0.16.0`, simplifying the logic in `fused_moe.py`, `patch_v2_eagle.py`, and `model_runner_v1.py`.
- Updating the documentation configuration in `docs/source/conf.py` to reflect the new version.

This cleanup simplifies maintenance and ensures compatibility with the latest vLLM features.

### Does this PR introduce _any_ user-facing change?
No, this is an internal dependency update and code cleanup. It does not introduce any user-facing changes.

### How was this patch tested?
CI passed with existing tests.

github-actions · 2026-03-13T06:10:23Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

MengqingCao · 2026-03-13T06:24:04Z

all e2e test passed in https://github.com/vllm-project/vllm-ascend/actions/runs/23034989358/job/66901024673?pr=7153

Signed-off-by: MengqingCao <cmq0113@163.com>

zhangxinyuehfad · 2026-03-13T07:27:58Z

/nightly all

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao · 2026-03-13T08:13:49Z

Let's merge this quickly as the e2e test is all passed in https://github.com/vllm-project/vllm-ascend/actions/runs/23034989358/job/66901024673?pr=7153

### What this PR does / why we need it? Drop 0.16.0 support in main - Fix eagle proposer break introduced by vllm-project/vllm#34552. Mainly change to use the draft attention group to initialize the attention metadata builder. - Fix the `ModelRunner` has no attribute `cudagraph_capture_sizes` error, which is a bug in vLLM v0.17.0, and fixed by a later pr vllm-project/vllm#30515 - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@4034c3d --------- Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao requested review from LCAIZJ, Yikun, realliujiaxu, wangxiyuan, whx-sjtu and zzzzwwjj as code owners March 11, 2026 06:25

github-actions bot added documentation Improvements or additions to documentation ci/build module:ops labels Mar 11, 2026

MengqingCao added ready read for review ready-for-test start test by label for PR labels Mar 11, 2026

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

wangxiyuan reviewed Mar 11, 2026

View reviewed changes

Comment thread docs/source/conf.py Outdated

Comment thread docs/source/installation.md Outdated

Comment thread docs/source/installation.md Outdated

wxsIcey reviewed Mar 11, 2026

View reviewed changes

Comment thread docs/source/installation.md Outdated

realliujiaxu reviewed Mar 11, 2026

View reviewed changes

Comment thread docs/source/conf.py

wangxiyuan approved these changes Mar 11, 2026

View reviewed changes

This was referenced Mar 11, 2026

[Feedback]: v0.17.0rc1 Release Feedback #7173

Open

[Release]: Release checklist for v0.17.0rc1 #7172

Closed

MengqingCao force-pushed the drop16 branch 2 times, most recently from c3463cb to 7481789 Compare March 12, 2026 14:09

github-actions bot added the merge-conflicts label Mar 13, 2026

MengqingCao added 5 commits March 13, 2026 06:44

[Version] Drop 0.16.0 support

14aca56

Signed-off-by: MengqingCao <cmq0113@163.com>

lint

5b3bd92

Signed-off-by: MengqingCao <cmq0113@163.com>

fix comments

9424d27

Signed-off-by: MengqingCao <cmq0113@163.com>

fix patch for cann

19e1356

Signed-off-by: MengqingCao <cmq0113@163.com>

revert changes on cann

ae0b501

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao added 6 commits March 13, 2026 06:44

fix cudagraph_batch_sizes in modelrunner

8d7b30d

Signed-off-by: MengqingCao <cmq0113@163.com>

fix lint

92803b7

Signed-off-by: MengqingCao <cmq0113@163.com>

fix spec decode attn metadata build

4b2d985

Signed-off-by: MengqingCao <cmq0113@163.com>

remove print

e9f5ba9

Signed-off-by: MengqingCao <cmq0113@163.com>

make version compacity

9254d99

Signed-off-by: MengqingCao <cmq0113@163.com>

fix rebase conflict

1c29e4f

Signed-off-by: MengqingCao <cmq0113@163.com>

zhangxinyuehfad added nightly-test and removed nightly-test labels Mar 13, 2026

MengqingCao force-pushed the drop16 branch from 1e21e41 to 1c29e4f Compare March 13, 2026 07:40

github-actions bot removed the merge-conflicts label Mar 13, 2026

skip TestEagleProposerLoadModel ut

0a1d1ef

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao merged commit 986cd45 into vllm-project:main Mar 13, 2026
23 of 27 checks passed

realliujiaxu mentioned this pull request Mar 16, 2026

[Bug]: 3.10主线代码构建的版本开启profiling 存在报错 #7124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Version] Drop 0.16.0 support#7153

[Version] Drop 0.16.0 support#7153
MengqingCao merged 12 commits intovllm-project:mainfrom
MengqingCao:drop16

MengqingCao commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

MengqingCao commented Mar 13, 2026

Uh oh!

zhangxinyuehfad commented Mar 13, 2026 •

edited

Loading

Uh oh!

MengqingCao commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

MengqingCao commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

MengqingCao commented Mar 13, 2026

Uh oh!

zhangxinyuehfad commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MengqingCao commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MengqingCao commented Mar 11, 2026 •

edited

Loading

zhangxinyuehfad commented Mar 13, 2026 •

edited

Loading