[Misc] Upgrade vllm version to 0408#8060
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the vLLM dependency to a newer commit and aligns the Ascend-specific implementation with recent upstream architectural changes. Key adjustments include refactoring the MoE runner to handle gate parameter storage correctly and cleaning up the input batch initialization logic to reduce code duplication and improve maintainability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request updates the vLLM commit hash across multiple Dockerfiles and documentation files to synchronize with the latest upstream changes. It also refactors the MoE runner initialization in vllm_ascend to resolve circular dependencies and potential AttributeError issues by bypassing the FusedMoE.shared_experts property during object construction. Additionally, it cleans up unused imports and refines the NPUInputBatch initialization logic to improve compatibility with upstream structures. The review comments identified critical circular dependency risks in the _init_runner methods, suggesting safer getattr usage to prevent initialization failures.
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
7819761 to
786cde3
Compare
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
Signed-off-by: wangli <wangli858794774@gmail.com>
| @@ -49,9 +49,7 @@ RUN pip config set global.index-url ${PIP_INDEX_URL} && \ | |||
| # Install vLLM | |||
| ARG VLLM_REPO=https://github.com/vllm-project/vllm.git | |||
| ARG VLLM_COMMIT=v0.19.0 | |||
Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? For the fusedmoe: vllm-project/vllm#33049 vllm-project/vllm#35949 FusedMoe refactor For the qwen3_vl: vllm-project/vllm#34539 A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later. vllm-project/vllm#38361 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? For the fusedmoe: vllm-project/vllm#33049 vllm-project/vllm#35949 FusedMoe refactor For the qwen3_vl: vllm-project/vllm#34539 A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later. vllm-project/vllm#38361 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: guxin108 <1252896542@qq.com>
### What this PR does / why we need it? For the fusedmoe: vllm-project/vllm#33049 vllm-project/vllm#35949 FusedMoe refactor For the qwen3_vl: vllm-project/vllm#34539 A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later. vllm-project/vllm#38361 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: - vLLM main: vllm-project/vllm@29e4870 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
What this PR does / why we need it?
For the fusedmoe:
vllm-project/vllm#33049
vllm-project/vllm#35949
FusedMoe refactor
For the qwen3_vl:
vllm-project/vllm#34539
A new Triton kernel has been added for fast rope position encoding. I've added a patch to fallback to native. We'll consider registering custom operators and implementing ascending later.
vllm-project/vllm#38361
Does this PR introduce any user-facing change?
How was this patch tested?