[bugfix](pcp) expand max_num_tokens for pcp pad#5478
[bugfix](pcp) expand max_num_tokens for pcp pad#5478zzzzwwjj merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a temporary workaround to adjust buffer sizes for Prefill Context Parallelism (PCP) by modifying max_num_batched_tokens before calling the parent constructor. While this approach is functional, it is not robust against exceptions that may occur during initialization. My review includes a suggestion to use a try...finally block to ensure the configuration is always restored to its original state, thereby improving the code's resilience and maintainability.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
c5a4d86 to
164cdb9
Compare
db45fc8 to
bbae1f8
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
bbae1f8 to
4e21ea9
Compare
### What this PR does / why we need it? Since the [PR](vllm-project/vllm#28988) for PCP modifications to `GPUModelRunner` has not yet been merged into vLLM, this PR temporarily requires adjustments to certain buffer sizes. These changes can be reverted once the original [PR](vllm-project/vllm#28988) is merged. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
### What this PR does / why we need it? Since the [PR](vllm-project/vllm#28988) for PCP modifications to `GPUModelRunner` has not yet been merged into vLLM, this PR temporarily requires adjustments to certain buffer sizes. These changes can be reverted once the original [PR](vllm-project/vllm#28988) is merged. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Since the [PR](vllm-project/vllm#28988) for PCP modifications to `GPUModelRunner` has not yet been merged into vLLM, this PR temporarily requires adjustments to certain buffer sizes. These changes can be reverted once the original [PR](vllm-project/vllm#28988) is merged. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
### What this PR does / why we need it? Since the [PR](vllm-project/vllm#28988) for PCP modifications to `GPUModelRunner` has not yet been merged into vLLM, this PR temporarily requires adjustments to certain buffer sizes. These changes can be reverted once the original [PR](vllm-project/vllm#28988) is merged. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
### What this PR does / why we need it? Since the [PR](vllm-project/vllm#28988) for PCP modifications to `GPUModelRunner` has not yet been merged into vLLM, this PR temporarily requires adjustments to certain buffer sizes. These changes can be reverted once the original [PR](vllm-project/vllm#28988) is merged. ### Does this PR introduce _any_ user-facing change? No - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
What this PR does / why we need it?
Since the PR for PCP modifications to
GPUModelRunnerhas not yet been merged into vLLM, this PR temporarily requires adjustments to certain buffer sizes. These changes can be reverted once the original PR is merged.Does this PR introduce any user-facing change?
No