Skip to content

Port of: fix kernel block size, port of #1439 (#1453)#1461

Merged
mgawarkiewicz-intel merged 2 commits into
vllm-project:releases/v0.21.0from
iboiko-habana:port1453
May 25, 2026
Merged

Port of: fix kernel block size, port of #1439 (#1453)#1461
mgawarkiewicz-intel merged 2 commits into
vllm-project:releases/v0.21.0from
iboiko-habana:port1453

Conversation

@iboiko-habana
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings May 19, 2026 09:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the HPU v1 attention backend’s declared supported kernel block sizes to include a smaller 16-token option, intended for testing and smaller models, while keeping the existing Granite 4.0-H related block sizes.

Changes:

  • Add 16 as a supported kernel block size for the HPU_ATTN_V1 backend.
  • Update the inline comment to document the rationale for the supported block sizes (testing, standard 128, and Granite hybrid requirements).

Copy link
Copy Markdown
Collaborator

@kamil-kaczor kamil-kaczor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@iboiko-habana iboiko-habana temporarily deployed to pre-merge-approval May 25, 2026 08:34 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
ad7125a431e176d4161099480a66f0169609a690

@mgawarkiewicz-intel mgawarkiewicz-intel merged commit 1d3e1a5 into vllm-project:releases/v0.21.0 May 25, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants