-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Support block size of 256 used by Intel HPU #26883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds 256 to the BlockSize type to support Intel HPU. While the change is simple, it exposes a potential issue. By adding 256 to the globally-defined BlockSize Literal, it becomes a valid option for all platforms, not just Intel HPU. This could lead to misconfiguration and runtime errors on platforms that do not support this block size, such as CUDA. I have added a review comment suggesting that this change should be accompanied by a validation mechanism to ensure platform compatibility, similar to how cache_dtype is handled.
| logger = init_logger(__name__) | ||
|
|
||
| BlockSize = Literal[1, 8, 16, 32, 64, 128] | ||
| BlockSize = Literal[1, 8, 16, 32, 64, 128, 256] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding 256 to the BlockSize Literal makes it a seemingly valid option for all platforms, but it is only intended for Intel HPU. This can be misleading for users of other platforms like CUDA, where this block size may not be supported and could lead to runtime errors.
A more robust approach would be to add platform-specific validation for block_size. For example, similar to how is_kv_cache_dtype_supported validates cache_dtype, a new method could be introduced in the Platform interface to validate block_size.
Since this change increases the risk of misconfiguration on some platforms, it would be much safer to accompany this change with a validation mechanism to prevent runtime failures.
|
Please fix DCO |
yewentao256
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work! Please also merge from main to fix some of the CI issue
Head branch was pushed to by a user without write access
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: mandy-li <[email protected]>
Signed-off-by: mandy-li <[email protected]>
Signed-off-by: mandy-li <[email protected]>
Signed-off-by: mandy-li <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>
Signed-off-by: mandy-li <[email protected]>
Signed-off-by: mandy-li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: mandy-li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: mandy-li <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: mandy-li <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: mandy-li <[email protected]>
Signed-off-by: mandy-li <[email protected]>
Signed-off-by: mandy-li <[email protected]>
This PR adds block size of 256 to the list which is used by Intel HPU fp8 models