Main2main Upgrade vllm commit to 0320 17:00#7510
Main2main Upgrade vllm commit to 0320 17:00#7510wangxiyuan merged 2 commits intovllm-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on upgrading the vLLM commit to a specific version (0320 17:00) and ensuring compatibility with vLLM version 0.17.0. It introduces conditional logic based on the vLLM version to adapt the behavior of certain components within the vLLM-Ascend project. The changes primarily involve incorporating version checking using the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the vLLM commit reference in the documentation and introduces conditional logic across several files to ensure compatibility with vLLM version 0.17.0. This adaptation addresses API changes in the upstream vLLM library, specifically concerning StatelessProcessGroup and the handling of virtual_engine within the forward context. The changes are well-contained within conditional blocks, allowing the codebase to support different vLLM versions. The documentation update reflects the new vLLM commit hash.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
8a5b756 to
9ed3c61
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
1 similar comment
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: leo-pony <nengjunma@outlook.com>
Root causes: - AscendMoERunner enters forward_impl_chunked path on DP runs due to flashinfer_all2allv backend causing use_dp_chunking=True; fix by overriding use_dp_chunking=False in AscendMoERunner - compile_ranges_endpoints is None when update_compile_ranges_split_points runs after vLLM moved _set_compile_ranges() to after check_and_update_config; fix by returning [] instead of None in _get_compile_ranges Upstream commit range: 6a9cceb..ed359c4 Co-Authored-By: Claude Code <noreply@anthropic.com> Signed-off-by: leo-pony <nengjunma@outlook.com>
9ed3c61 to
17d3c99
Compare
### What this PR does / why we need it? Main2main Upgrade vllm commit to 0320 17:00 1. fix vllm refactored `_moe_forward` to call `runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification [#33049](vllm-project/vllm#33049)" 2.fix vllm moved the call to `self._set_compile_ranges()` in `VllmConfig.__post_init__` from **before** `check_and_update_config()` to **after** it (to allow platforms to lower `max_num_batched_tokens` first). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" [#37523](vllm-project/vllm#37523) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8b63257 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>
### What this PR does / why we need it? Main2main Upgrade vllm commit to 0320 17:00 1. fix vllm refactored `_moe_forward` to call `runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification [#33049](vllm-project/vllm#33049)" 2.fix vllm moved the call to `self._set_compile_ranges()` in `VllmConfig.__post_init__` from **before** `check_and_update_config()` to **after** it (to allow platforms to lower `max_num_batched_tokens` first). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" [#37523](vllm-project/vllm#37523) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8b63257 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>
### What this PR does / why we need it? Main2main Upgrade vllm commit to 0320 17:00 1. fix vllm refactored `_moe_forward` to call `runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification [#33049](vllm-project/vllm#33049)" 2.fix vllm moved the call to `self._set_compile_ranges()` in `VllmConfig.__post_init__` from **before** `check_and_update_config()` to **after** it (to allow platforms to lower `max_num_batched_tokens` first). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" [#37523](vllm-project/vllm#37523) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8b63257 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>
What this PR does / why we need it?
Main2main Upgrade vllm commit to 0320 17:00
_moe_forwardto callrunner.forward_impl_chunked()whenrunner.use_dp_chunkingis True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification #33049"2.fix vllm moved the call to
self._set_compile_ranges()inVllmConfig.__post_init__from beforecheck_and_update_config()to after it (to allow platforms to lowermax_num_batched_tokensfirst). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" #37523Does this PR introduce any user-facing change?
NA
How was this patch tested?
NA