[Version] Drop 0.16.0 support#7153
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on streamlining the project's dependency on vLLM by completely dropping support for version 0.16.0 and standardizing on version 0.17.0. This change simplifies the codebase by removing version-specific conditional logic, ensuring that all components are built and run against the latest supported vLLM release. The update also includes minor adjustments to installation instructions, reflecting a more current setup for the Ascend environment. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request effectively removes support for vLLM 0.16.0 by updating version tags in Dockerfiles and documentation, and by removing conditional logic for version 0.16.0 from the codebase. The changes are consistent and align with the goal of dropping the older version support. I have also provided a suggested PR title and summary that aligns with the repository's style guide. I found no critical or high-severity issues in the code changes.
Suggested PR Title:
[Version][Misc] Drop vLLM 0.16.0 supportSuggested PR Summary:
### What this PR does / why we need it?
This pull request removes support for vLLM version 0.16.0 and updates the codebase to align with vLLM version 0.17.0.
Key changes include:
- Updating the `VLLM_TAG` in all `Dockerfile`s to `v0.17.0`.
- Removing conditional code paths that were specific to vLLM `v0.16.0`, simplifying the logic in `fused_moe.py`, `patch_v2_eagle.py`, and `model_runner_v1.py`.
- Updating the documentation configuration in `docs/source/conf.py` to reflect the new version.
This cleanup simplifies maintenance and ensures compatibility with the latest vLLM features.
### Does this PR introduce _any_ user-facing change?
No, this is an internal dependency update and code cleanup. It does not introduce any user-facing changes.
### How was this patch tested?
CI passed with existing tests.c3463cb to
7481789
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
|
/nightly all |
Signed-off-by: MengqingCao <cmq0113@163.com>
|
Let's merge this quickly as the e2e test is all passed in https://github.com/vllm-project/vllm-ascend/actions/runs/23034989358/job/66901024673?pr=7153 |
### What this PR does / why we need it? Drop 0.16.0 support in main - Fix eagle proposer break introduced by vllm-project/vllm#34552. Mainly change to use the draft attention group to initialize the attention metadata builder. - Fix the `ModelRunner` has no attribute `cudagraph_capture_sizes` error, which is a bug in vLLM v0.17.0, and fixed by a later pr vllm-project/vllm#30515 - vLLM version: v0.16.0 - vLLM main: vllm-project/vllm@4034c3d --------- Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
Drop 0.16.0 support in main
ModelRunnerhas no attributecudagraph_capture_sizeserror, which is a bug in vLLM v0.17.0, and fixed by a later pr [UX][Startup] Account for CUDA graphs during memory profiling vllm#30515How was this patch tested?
CI passed with existing test.