Fix patch_hf3fs_mock_client_for_cpu_only by hsubramony · Pull Request #1439 · vllm-project/vllm-gaudi

hsubramony · 2026-05-12T01:18:36Z

added in fix kernel block size, port of #1439 #1453
16 is supported for testing/smaller models; 128 is the standard HPU
kernel block size; 528 is required for Granite 4.0-H
(granitemoehybrid) without prefix caching (16-token FA alignment),
768 with prefix caching (chunk-aligned).
_patch_hf3fs_mock_client_for_cpu_only
Upstream mock client unconditionally calls
torch.cuda.current_stream().wait_event(event) in batch_write.
In environments where PyTorch is not compiled with CUDA, that path throws
and the method returns -1 for writes, causing connector unit tests to
fail. This patch keeps the same behavior but skips CUDA synchronization when
CUDA is unavailable.

Signed-off-by: Harish Subramony <harish.subramony@intel.com>

Copilot

Pull request overview

Expands the set of supported HPU V1 attention kernel block sizes to allow a smaller 16-token option (useful for testing/saller models) while retaining the existing production sizes used for standard and Granite 4.0-H configurations.

Changes:

Add 16 to the supported kernel block size list for the HPU V1 attention backend.
Update the in-code comment to document when/why each block size is used (standard vs Granite 4.0-H, with/without prefix caching).

github-actions · 2026-05-13T11:19:16Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

github-actions · 2026-05-13T18:09:49Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Signed-off-by: Harish Subramony <harish.subramony@intel.com>

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

kamil-kaczor

lgtm

github-actions · 2026-05-19T13:45:52Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

fix kernel block size

fddf12b

Signed-off-by: Harish Subramony <harish.subramony@intel.com>

Copilot AI review requested due to automatic review settings May 12, 2026 01:18

hsubramony requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 12, 2026 01:18

Copilot started reviewing on behalf of hsubramony May 12, 2026 01:19 View session

Copilot AI reviewed May 12, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 12, 2026

🚦 Team Review Dashboard #701

Open

hsubramony force-pushed the fix_kernel_block_sizes branch from 7e3bcb4 to 592b1c3 Compare May 13, 2026 18:09

hsubramony added 2 commits May 13, 2026 11:12

Merge remote-tracking branch 'origin/main' into fix_kernel_block_sizes

d2d1434

Fix HF3FS mock client CPU-only writes

1550f30

Signed-off-by: Harish Subramony <harish.subramony@intel.com>

hsubramony force-pushed the fix_kernel_block_sizes branch from 592b1c3 to 1550f30 Compare May 13, 2026 18:20

hsubramony and others added 5 commits May 14, 2026 09:39

Merge branch 'main' into fix_kernel_block_sizes

1af8a2a

Merge branch 'main' into fix_kernel_block_sizes

bdb9cef

Merge branch 'main' into fix_kernel_block_sizes

2d0cc77

Merge remote-tracking branch 'origin/main' into fix_kernel_block_sizes

5cabc4a

pre-commit fixes

a00768c

Signed-off-by: Harish Subramony <harish.subramony@intel.com>

kamil-kaczor pushed a commit that referenced this pull request May 19, 2026

fix kernel block size, port of #1439 (#1453)

a331930

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Merge branch 'main' into fix_kernel_block_sizes

9531a4c

iboiko-habana mentioned this pull request May 19, 2026

fix kernel block size, port of #1439 #1453

Merged

kamil-kaczor approved these changes May 19, 2026

View reviewed changes

iboiko-habana approved these changes May 19, 2026

View reviewed changes

iboiko-habana changed the title ~~fix kernel block size~~ Fix patch_hf3fs_mock_client_for_cpu_only May 19, 2026

iboiko-habana merged commit 8c5008e into vllm-project:main May 19, 2026
2 checks passed

mgawarkiewicz-intel pushed a commit that referenced this pull request May 25, 2026

Port of: fix kernel block size, port of #1439 (#1453) (#1461)

1d3e1a5

Signed-off-by: Iryna Boiko <iboiko@habana.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix patch_hf3fs_mock_client_for_cpu_only#1439

Fix patch_hf3fs_mock_client_for_cpu_only#1439
iboiko-habana merged 9 commits into
vllm-project:mainfrom
hsubramony:fix_kernel_block_sizes

hsubramony commented May 12, 2026 •

edited by iboiko-habana

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

kamil-kaczor left a comment

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hsubramony commented May 12, 2026 • edited by iboiko-habana Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented May 13, 2026

✅ CI Passed

Uh oh!

github-actions Bot commented May 13, 2026

🚧 CI Blocked

Uh oh!

kamil-kaczor left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hsubramony commented May 12, 2026 •

edited by iboiko-habana

Loading