Skip to content

Conversation

@mandy-li
Copy link
Contributor

@mandy-li mandy-li commented Oct 15, 2025

This PR adds block size of 256 to the list which is used by Intel HPU fp8 models

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds 256 to the BlockSize type to support Intel HPU. While the change is simple, it exposes a potential issue. By adding 256 to the globally-defined BlockSize Literal, it becomes a valid option for all platforms, not just Intel HPU. This could lead to misconfiguration and runtime errors on platforms that do not support this block size, such as CUDA. I have added a review comment suggesting that this change should be accompanied by a validation mechanism to ensure platform compatibility, similar to how cache_dtype is handled.

logger = init_logger(__name__)

BlockSize = Literal[1, 8, 16, 32, 64, 128]
BlockSize = Literal[1, 8, 16, 32, 64, 128, 256]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Adding 256 to the BlockSize Literal makes it a seemingly valid option for all platforms, but it is only intended for Intel HPU. This can be misleading for users of other platforms like CUDA, where this block size may not be supported and could lead to runtime errors.

A more robust approach would be to add platform-specific validation for block_size. For example, similar to how is_kv_cache_dtype_supported validates cache_dtype, a new method could be introduced in the Platform interface to validate block_size.

Since this change increases the risk of misconfiguration on some platforms, it would be much safer to accompany this change with a validation mechanism to prevent runtime failures.

@hmellor hmellor enabled auto-merge (squash) October 15, 2025 07:44
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025
@hmellor
Copy link
Member

hmellor commented Oct 15, 2025

Please fix DCO

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work! Please also merge from main to fix some of the CI issue

auto-merge was automatically disabled October 16, 2025 06:23

Head branch was pushed to by a user without write access

@mergify mergify bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm v1 labels Oct 16, 2025
@mergify mergify bot added tpu Related to Google TPUs tool-calling labels Oct 16, 2025
@mergify
Copy link

mergify bot commented Oct 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mandy-li.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 16, 2025
@mergify mergify bot removed tpu Related to Google TPUs needs-rebase labels Oct 16, 2025
@yewentao256 yewentao256 merged commit ac3ed5a into vllm-project:main Oct 16, 2025
45 checks passed
@github-project-automation github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Oct 16, 2025
Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm tool-calling v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants