Skip to content

[FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307)#1159

Closed
tzielinski-habana wants to merge 2 commits intovllm-project:mainfrom
tzielinski-habana:fix/vllm-hourly-GAUDISW-247307
Closed

[FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307)#1159
tzielinski-habana wants to merge 2 commits intovllm-project:mainfrom
tzielinski-habana:fix/vllm-hourly-GAUDISW-247307

Conversation

@tzielinski-habana
Copy link
Copy Markdown
Collaborator

Summary

Adapts vllm-gaudi to upstream vLLM commit cfaf4668f (PR #36610 "[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec") which introduced breaking changes to the offloading connector APIs.

Changes

1. tests/unit_tests/kv_offload/test_offloading_connector.py

  • Create a KVCacheConfig and pass it to OffloadingConnector constructor (now required by upstream assertion)

2. vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py

  • Update monkey-patched get_handlers to use new attribute names:
    • self.gpu_block_size is now tuple[int, ...] → extract with self.gpu_block_size[0]
    • self.offloaded_block_size removed → compute as gpu_block_size * self.block_size_factor

Verified on HPU

  • offloading_connector tests: 9 passed on Gaudi 3 (g3)
  • CPU offloading engine init: LLM initialized successfully on Gaudi 3 (g3)

CI Run (Failure)

https://github.com/vllm-project/vllm-gaudi/actions/runs/23042890934

Jira

GAUDISW-247307

…efactor (GAUDISW-247307)

Signed-off-by: tzielinski-habana <tomasz.zielinski@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates vllm-gaudi KV offloading integration to match upstream vLLM API changes around OffloadingSpec and KVCacheConfig.

Changes:

  • Update Gaudi CPU↔HPU offloading handler creation to use new gpu_block_size representation and compute CPU block size via block_size_factor.
  • Adjust unit tests to construct and pass the required KVCacheConfig into OffloadingConnector.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py Adapts handler initialization to upstream gpu_block_size refactor and replaces removed offloaded_block_size usage.
tests/unit_tests/kv_offload/test_offloading_connector.py Updates tests to satisfy new OffloadingConnector constructor requirements by providing a KVCacheConfig.

You can also share your feedback on Copilot code review. Take the survey.

self.scheduler.update_from_output(scheduler_output, model_runner_output)

if (prev_token_id is EOS_TOKEN_ID and prev_token_id != token_id and self.scheduler.requests):
if prev_token_id is EOS_TOKEN_ID and prev_token_id != token_id and self.scheduler.requests:
Comment on lines 369 to +375
if not self._handlers:
assert len(self.gpu_block_size) == 1
gpu_block_size = self.gpu_block_size[0]
self._handlers = CpuGpuOffloadingHandlers(
attn_backends=attn_backends,
gpu_block_size=self.gpu_block_size,
cpu_block_size=self.offloaded_block_size,
gpu_block_size=gpu_block_size,
cpu_block_size=gpu_block_size * self.block_size_factor,
@tzielinski-habana tzielinski-habana changed the title fix: [vllm-hourly] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307) [FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307) Mar 13, 2026
Signed-off-by: tzielinski-habana <tomasz.zielinski@intel.com>
@github-actions
Copy link
Copy Markdown

✅ CI Passed

All checks passed successfully against the following vllm commit:
0005d2a3c9ed8cf8bab4018b7064ceb4fd9548d1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants