Skip to content

Fix patch_hf3fs_mock_client_for_cpu_only#1439

Merged
iboiko-habana merged 9 commits into
vllm-project:mainfrom
hsubramony:fix_kernel_block_sizes
May 19, 2026
Merged

Fix patch_hf3fs_mock_client_for_cpu_only#1439
iboiko-habana merged 9 commits into
vllm-project:mainfrom
hsubramony:fix_kernel_block_sizes

Conversation

@hsubramony
Copy link
Copy Markdown
Contributor

@hsubramony hsubramony commented May 12, 2026

  1. added in fix kernel block size, port of #1439 #1453
    16 is supported for testing/smaller models; 128 is the standard HPU
    kernel block size; 528 is required for Granite 4.0-H
    (granitemoehybrid) without prefix caching (16-token FA alignment),
    768 with prefix caching (chunk-aligned).

  2. _patch_hf3fs_mock_client_for_cpu_only
    Upstream mock client unconditionally calls
    torch.cuda.current_stream().wait_event(event) in batch_write.
    In environments where PyTorch is not compiled with CUDA, that path throws
    and the method returns -1 for writes, causing connector unit tests to
    fail. This patch keeps the same behavior but skips CUDA synchronization when
    CUDA is unavailable.

Signed-off-by: Harish Subramony <harish.subramony@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Expands the set of supported HPU V1 attention kernel block sizes to allow a smaller 16-token option (useful for testing/saller models) while retaining the existing production sizes used for standard and Granite 4.0-H configurations.

Changes:

  • Add 16 to the supported kernel block size list for the HPU V1 attention backend.
  • Update the in-code comment to document when/why each block size is used (standard vs Granite 4.0-H, with/without prefix caching).

@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

@hsubramony hsubramony force-pushed the fix_kernel_block_sizes branch from 7e3bcb4 to 592b1c3 Compare May 13, 2026 18:09
@github-actions
Copy link
Copy Markdown

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

@hsubramony hsubramony force-pushed the fix_kernel_block_sizes branch from 592b1c3 to 1550f30 Compare May 13, 2026 18:20
kamil-kaczor pushed a commit that referenced this pull request May 19, 2026
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Copy link
Copy Markdown
Collaborator

@kamil-kaczor kamil-kaczor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@iboiko-habana iboiko-habana changed the title fix kernel block size Fix patch_hf3fs_mock_client_for_cpu_only May 19, 2026
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
54f548e9e58087f0155e4e164e416ad7efdfde6d

@iboiko-habana iboiko-habana merged commit 8c5008e into vllm-project:main May 19, 2026
2 checks passed
mgawarkiewicz-intel pushed a commit that referenced this pull request May 25, 2026
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants