[BugFix] Handle num_cached_tokens/num_external_computed_tokens for different vllm version#8426
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request introduces version-specific logic across several schedulers (recompute_scheduler.py, scheduler_dynamic_batch.py, scheduler_profiling_chunk.py, and patch_balance_schedule.py) to handle num_cached_tokens and num_external_computed_tokens based on the vLLM version. For vLLM 0.19.0, it uses these direct attributes, while for other versions, it maintains legacy support by setting and retrieving prefill_stats. However, the use of exact version matching via vllm_version_is("0.19.0") is highly problematic as it will fail for any patch releases (e.g., 0.19.1) or future minor/major versions, potentially causing regressions or runtime errors when the code reverts to legacy paths on newer vLLM releases.
Suggested PR Title:
[Ops][BugFix] Handle num_cached_tokens/num_external_computed_tokens for different vllm versionSuggested PR Summary:
### What this PR does / why we need it?
This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`.
Fixes https://github.com/vllm-project/vllm/pull/37460
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
CI passed with existing tests.| # Count the number of prefix cached tokens. | ||
| if request.num_cached_tokens < 0: | ||
| request.num_cached_tokens = request.num_computed_tokens | ||
| if vllm_version_is("0.19.0"): |
There was a problem hiding this comment.
The vllm_version_is("0.19.0") check uses exact equality (as defined in vllm_ascend/utils.py). This logic will return False for any subsequent versions (e.g., 0.19.1, 0.20.0), causing the scheduler to revert to the legacy prefill_stats path. If the API changes introduced in 0.19.0 persist in later versions, this will lead to runtime errors. Consider using a version comparison (e.g., >= 0.19.0) to ensure future compatibility.
| # Count the number of prefix cached tokens. | ||
| if request.num_cached_tokens < 0: | ||
| request.num_cached_tokens = num_computed_tokens | ||
| if vllm_version_is("0.19.0"): |
| continue | ||
|
|
||
| request.num_external_computed_tokens = ext_tokens | ||
| if vllm_version_is("0.19.0"): |
…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com>
…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: tfhddd <2272751277@qq.com>
…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com>
…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: guxin108 <1252896542@qq.com>
…fferent vllm version (vllm-project#8426) ### What this PR does / why we need it? This fix vllm-project/vllm#37460 This PR introduces version-specific logic to handle `num_cached_tokens` and `num_external_computed_tokens` in the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions via `prefill_stats`. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.19.0 - vLLM main: vllm-project/vllm@6f786f2 Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: zouyida2052 <zouyida2002@gmail.com>
What this PR does / why we need it?
This fix vllm-project/vllm#37460
This PR introduces version-specific logic to handle
num_cached_tokensandnum_external_computed_tokensin the scheduler, ensuring compatibility with vLLM 0.19.0 and maintaining legacy support for older versions viaprefill_stats.Does this PR introduce any user-facing change?
How was this patch tested?