Support block size of 256 used by Intel HPU #26883

mandy-li · 2025-10-15T05:23:55Z

This PR adds block size of 256 to the list which is used by Intel HPU fp8 models

github-actions · 2025-10-15T05:24:07Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request adds 256 to the BlockSize type to support Intel HPU. While the change is simple, it exposes a potential issue. By adding 256 to the globally-defined BlockSize Literal, it becomes a valid option for all platforms, not just Intel HPU. This could lead to misconfiguration and runtime errors on platforms that do not support this block size, such as CUDA. I have added a review comment suggesting that this change should be accompanied by a validation mechanism to ensure platform compatibility, similar to how cache_dtype is handled.

gemini-code-assist · 2025-10-15T05:25:45Z

vllm/config/cache.py

 logger = init_logger(__name__)

-BlockSize = Literal[1, 8, 16, 32, 64, 128]
+BlockSize = Literal[1, 8, 16, 32, 64, 128, 256]


Adding 256 to the BlockSize Literal makes it a seemingly valid option for all platforms, but it is only intended for Intel HPU. This can be misleading for users of other platforms like CUDA, where this block size may not be supported and could lead to runtime errors.

A more robust approach would be to add platform-specific validation for block_size. For example, similar to how is_kv_cache_dtype_supported validates cache_dtype, a new method could be introduced in the Platform interface to validate block_size.

Since this change increases the risk of misconfiguration on some platforms, it would be much safer to accompany this change with a validation mechanism to prevent runtime failures.

hmellor · 2025-10-15T09:29:03Z

Please fix DCO

yewentao256

Thanks for the work! Please also merge from main to fix some of the CI issue

mergify · 2025-10-16T08:42:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @mandy-li.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: mandy-li <[email protected]>

Signed-off-by: mandy-li <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

Signed-off-by: mandy-li <[email protected]>

Signed-off-by: mandy-li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: mandy-li <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Signed-off-by: mandy-li <[email protected]>

mandy-li requested review from ProExpertProg, WoosukKwon, heheda12345, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners October 15, 2025 05:23

gemini-code-assist bot reviewed Oct 15, 2025

View reviewed changes

hmellor approved these changes Oct 15, 2025

View reviewed changes

hmellor enabled auto-merge (squash) October 15, 2025 07:44

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025

yewentao256 approved these changes Oct 15, 2025

View reviewed changes

auto-merge was automatically disabled October 16, 2025 06:23
Head branch was pushed to by a user without write access

mandy-li requested review from 22quinn, DarkLight1337, NickLucche, aarnphm, gshtras, jeejeelee, jikunshang, luccafong, noooop, sighingnow and ywang96 as code owners October 16, 2025 06:23

mergify bot added gpt-oss Related to GPT-OSS models rocm Related to AMD ROCm v1 labels Oct 16, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Oct 16, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Oct 16, 2025

mergify bot added tpu Related to Google TPUs tool-calling labels Oct 16, 2025

github-project-automation bot added this to Tool Calling Oct 16, 2025

mergify bot assigned sangstar Oct 16, 2025

mergify bot added the needs-rebase label Oct 16, 2025

mandy-li force-pushed the main branch from 9691e7a to 977dd2f Compare October 16, 2025 16:52

mergify bot removed tpu Related to Google TPUs needs-rebase labels Oct 16, 2025

Increase block size to 256 used by Intel HPU

89685cd

Signed-off-by: mandy-li <[email protected]>

mandy-li force-pushed the main branch from 977dd2f to 89685cd Compare October 16, 2025 17:03

yewentao256 merged commit ac3ed5a into vllm-project:main Oct 16, 2025
45 checks passed

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Oct 16, 2025

github-project-automation bot moved this to Done in Tool Calling Oct 16, 2025

Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

22c6a60

Signed-off-by: mandy-li <[email protected]>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

e62a1be

Signed-off-by: mandy-li <[email protected]>

albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

4ab613d

Signed-off-by: mandy-li <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

e4a5f9d

Signed-off-by: mandy-li <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

51b9626

Signed-off-by: mandy-li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

de191de

Signed-off-by: mandy-li <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

9d9c085

Signed-off-by: mandy-li <[email protected]> Signed-off-by: 0xrushi <[email protected]>

0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

aefe7e5

Signed-off-by: mandy-li <[email protected]> Signed-off-by: 0xrushi <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

b4748fc

Signed-off-by: mandy-li <[email protected]>

Zhathw pushed a commit to Zhathw/vllm that referenced this pull request Nov 12, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

518d7a4

Signed-off-by: mandy-li <[email protected]>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

Support block size of 256 used by Intel HPU (vllm-project#26883)

b36b1fe

Signed-off-by: mandy-li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support block size of 256 used by Intel HPU #26883

Support block size of 256 used by Intel HPU #26883

Uh oh!

mandy-li commented Oct 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 15, 2025

Uh oh!

hmellor commented Oct 15, 2025

Uh oh!

yewentao256 left a comment

Uh oh!

mergify bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Support block size of 256 used by Intel HPU #26883

Support block size of 256 used by Intel HPU #26883

Uh oh!

Conversation

mandy-li commented Oct 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

hmellor commented Oct 15, 2025

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mandy-li commented Oct 15, 2025 •

edited by github-actions bot

Loading