fix kernel block size, port of #1439 by iboiko-habana · Pull Request #1453 · vllm-project/vllm-gaudi

iboiko-habana · 2026-05-18T08:23:59Z

Port of #1439
16 is supported for testing/smaller models; 128 is the standard HPU
kernel block size; 528 is required for Granite 4.0-H
(granitemoehybrid) without prefix caching (16-token FA alignment),
768 with prefix caching (chunk-aligned).

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Copilot

Pull request overview

This PR updates the V1 HPU attention backend’s supported kernel block sizes to include 16-token blocks while preserving the existing 128/528/768-token support used by standard HPU and Granite hybrid configurations.

Changes:

Adds 16 to HPUAttentionBackendV1.get_supported_kernel_block_sizes().
Updates the inline comment to describe the new smaller/testing block-size support.

github-actions · 2026-05-18T21:55:54Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

1) added in #1453 16 is supported for testing/smaller models; 128 is the standard HPU kernel block size; 528 is required for Granite 4.0-H (granitemoehybrid) without prefix caching (16-token FA alignment), 768 with prefix caching (chunk-aligned). 2) _patch_hf3fs_mock_client_for_cpu_only Upstream mock client unconditionally calls ``torch.cuda.current_stream().wait_event(event)`` in ``batch_write``. In environments where PyTorch is not compiled with CUDA, that path throws and the method returns ``-1`` for writes, causing connector unit tests to fail. This patch keeps the same behavior but skips CUDA synchronization when CUDA is unavailable. --------- Signed-off-by: Harish Subramony <harish.subramony@intel.com> Co-authored-by: Iryna Boiko <iryna.boiko@intel.com>

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

fix kernel block size, vllm-project#1439

b00f29b

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Copilot AI review requested due to automatic review settings May 18, 2026 08:24

iboiko-habana requested review from PatrykWo, adobrzyn, afierka-intel, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 18, 2026 08:24

Copilot started reviewing on behalf of iboiko-habana May 18, 2026 08:24 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 18, 2026

🚦 Team Review Dashboard #701

Open

kamil-kaczor approved these changes May 18, 2026

View reviewed changes

Merge branch 'main' into pr1439_port

38c7a9e

kamil-kaczor merged commit a331930 into vllm-project:main May 19, 2026
2 checks passed

iboiko-habana mentioned this pull request May 19, 2026

Fix patch_hf3fs_mock_client_for_cpu_only #1439

Merged

mgawarkiewicz-intel pushed a commit that referenced this pull request May 25, 2026

Port of: fix kernel block size, port of #1439 (#1453) (#1461)

1d3e1a5

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix kernel block size, port of #1439#1453

fix kernel block size, port of #1439#1453
kamil-kaczor merged 2 commits into
vllm-project:mainfrom
iboiko-habana:pr1439_port

iboiko-habana commented May 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iboiko-habana commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 18, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iboiko-habana commented May 18, 2026 •

edited

Loading