[FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307) by tzielinski-habana · Pull Request #1159 · vllm-project/vllm-gaudi

tzielinski-habana · 2026-03-13T13:42:47Z

Summary

Adapts vllm-gaudi to upstream vLLM commit cfaf4668f (PR #36610 "[kv_offload+HMA][1/N]: Support multiple KV groups in OffloadingSpec") which introduced breaking changes to the offloading connector APIs.

Changes

1. `tests/unit_tests/kv_offload/test_offloading_connector.py`

Create a KVCacheConfig and pass it to OffloadingConnector constructor (now required by upstream assertion)

2. `vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py`

Update monkey-patched get_handlers to use new attribute names:
- self.gpu_block_size is now tuple[int, ...] → extract with self.gpu_block_size[0]
- self.offloaded_block_size removed → compute as gpu_block_size * self.block_size_factor

Verified on HPU

offloading_connector tests: 9 passed on Gaudi 3 (g3)
CPU offloading engine init: LLM initialized successfully on Gaudi 3 (g3)

CI Run (Failure)

https://github.com/vllm-project/vllm-gaudi/actions/runs/23042890934

Jira

GAUDISW-247307

…efactor (GAUDISW-247307) Signed-off-by: tzielinski-habana <tomasz.zielinski@intel.com>

Copilot

Pull request overview

Updates vllm-gaudi KV offloading integration to match upstream vLLM API changes around OffloadingSpec and KVCacheConfig.

Changes:

Update Gaudi CPU↔HPU offloading handler creation to use new gpu_block_size representation and compute CPU block size via block_size_factor.
Adjust unit tests to construct and pass the required KVCacheConfig into OffloadingConnector.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py	Adapts handler initialization to upstream `gpu_block_size` refactor and replaces removed `offloaded_block_size` usage.
tests/unit_tests/kv_offload/test_offloading_connector.py	Updates tests to satisfy new `OffloadingConnector` constructor requirements by providing a `KVCacheConfig`.

You can also share your feedback on Copilot code review. Take the survey.

            self.scheduler.update_from_output(scheduler_output, model_runner_output)

-            if (prev_token_id is EOS_TOKEN_ID and prev_token_id != token_id and self.scheduler.requests):
+            if prev_token_id is EOS_TOKEN_ID and prev_token_id != token_id and self.scheduler.requests:


    if not self._handlers:
+        assert len(self.gpu_block_size) == 1
+        gpu_block_size = self.gpu_block_size[0]
        self._handlers = CpuGpuOffloadingHandlers(
            attn_backends=attn_backends,
-            gpu_block_size=self.gpu_block_size,
-            cpu_block_size=self.offloaded_block_size,
+            gpu_block_size=gpu_block_size,
+            cpu_block_size=gpu_block_size * self.block_size_factor,


Signed-off-by: tzielinski-habana <tomasz.zielinski@intel.com>

github-actions · 2026-03-14T01:55:53Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
0005d2a3c9ed8cf8bab4018b7064ceb4fd9548d1

fix: [vllm-hourly] adapt to upstream OffloadingSpec kv_cache_config r…

6d11095

…efactor (GAUDISW-247307) Signed-off-by: tzielinski-habana <tomasz.zielinski@intel.com>

tzielinski-habana requested review from adobrzyn, afierka-intel, mgawarkiewicz-intel and xuechendi as code owners March 13, 2026 13:42

Copilot AI review requested due to automatic review settings March 13, 2026 13:42

tzielinski-habana requested review from PatrykWo, iboiko-habana, kamil-kaczor, ksmusz and michalkuligowski as code owners March 13, 2026 13:42

Copilot AI reviewed Mar 13, 2026

View reviewed changes

tzielinski-habana changed the title ~~fix: [vllm-hourly] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307)~~ [FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307) Mar 13, 2026

Copilot started reviewing on behalf of tzielinski-habana March 13, 2026 14:01 View session

github-actions Bot mentioned this pull request Mar 13, 2026

🚦 Team Review Dashboard #701

Open

Remove linter cosmetic changes, keep only functional fixes

df890a8

Signed-off-by: tzielinski-habana <tomasz.zielinski@intel.com>

iboiko-habana closed this Mar 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307)#1159

[FIX_FOR_VLLM_LATEST] adapt to upstream OffloadingSpec kv_cache_config refactor (GAUDISW-247307)#1159
tzielinski-habana wants to merge 2 commits intovllm-project:mainfrom
tzielinski-habana:fix/vllm-hourly-GAUDISW-247307

tzielinski-habana commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tzielinski-habana commented Mar 13, 2026

Summary

Changes

1. tests/unit_tests/kv_offload/test_offloading_connector.py

2. vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py

Verified on HPU

CI Run (Failure)

Jira

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Mar 14, 2026

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. `tests/unit_tests/kv_offload/test_offloading_connector.py`

2. `vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py`