[FIX_FOR_VLLM_CUSTOM=14acf429ac08b6d538ca6feb3e06b6d13895804d] Fix CPUOffloadingSpec import path and remove obsolete roberta patch by pawel-olejniczak · Pull Request #1229 · vllm-project/vllm-gaudi

pawel-olejniczak · 2026-03-24T11:17:03Z

Summary

Multiple upstream vLLM changes broke the hourly CI (RED since 2026-03-23 13:07 UTC):

CPUOffloadingSpec import path — upstream PR #37874 refactored cpu.py into a cpu/ package
replace_roberta_positions removed — upstream PR #37884
vllm_is_batch_invariant removed — replaced with envs.VLLM_BATCH_INVARIANT
key_cache guard for None — upstream decode path can pass None
Synapse SDPA error 400 — continuation prefills triggered Synapse errors
Attention.kv_cache list-to-element refactor — upstream PR #37487 (c59a132f9) changed Attention.kv_cache from list to tensor. HPU code used self.kv_cache[0] producing garbage output.

Changes

cpu_hpu.py: Updated CPUOffloadingSpec import path
models/roberta.py: Removed obsolete monkey-patch
init.py: Removed roberta import
vllm_gaudi_batch_invariant.py: Replace vllm_is_batch_invariant with envs.VLLM_BATCH_INVARIANT
ops/hpu_paged_attn.py: Guard decode path against None key_cache
attention/hpu_attn.py: Fix SDPA padding for continuation prefills
ops/hpu_attention.py: self.kv_cache[0] -> self.kv_cache (fix Fix CI fail hang #6)
attention/oot_mla.py: self.kv_cache[0] -> self.kv_cache (fix Fix CI fail hang #6)

Impact

Fixes ALL 50+ e2e test failures and restores correct model output on HPU.

AI-assisted: All changes reviewed and verified on HPU hardware.

Copilot

Pull request overview

Fixes breakages caused by recent upstream vLLM refactors by updating import paths and removing an obsolete RoBERTa monkey-patch.

Changes:

Update CPUOffloadingSpec import to the new vllm.v1.kv_offload.cpu.spec module location.
Remove the now-obsolete RoBERTa forward monkey-patch and stop importing it during model registration.
Leave a short note in roberta.py explaining why the patch was removed.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py	Switches to the new upstream import path for `CPUOffloadingSpec`.
vllm_gaudi/models/roberta.py	Removes the previous monkey-patch implementation and replaces it with an explanatory note.
vllm_gaudi/init.py	Stops importing the removed RoBERTa patch module during model registration.

iboiko-habana · 2026-03-24T12:22:23Z

-    if token_type_ids is not None:
-        assert self.roberta.config.vocab_size < (1 << TOKEN_TYPE_SHIFT)
-        assert input_ids is not None
-


roberta.py was added in #1001. it was special handling of _encode_token_type_ids. after removal of roberta.py, _encode_token_type_ids(input_ids, token_type_ids) will be used from upstream forward function. let's wait for roberta models test's results

I have similar concerns here. Let’s wait for the test results.

github-actions · 2026-03-25T16:06:33Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

…solete roberta patch - Update CPUOffloadingSpec import from vllm.v1.kv_offload.cpu to vllm.v1.kv_offload.cpu.spec (upstream PR #37874 refactored cpu.py into a cpu/ package) - Remove roberta monkey-patch that called the now-deleted replace_roberta_positions function (upstream PR #37884 moved the position offset adjustment into RobertaEmbedding.forward()) - Remove corresponding roberta import from register_models() Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com>

…e removed vllm_is_batch_invariant with envs.VLLM_BATCH_INVARIANT Upstream vLLM PR #35007 removed the vllm_is_batch_invariant() function from batch_invariant.py, replacing it with a direct envs read. Update vllm-gaudi to match. Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com> Co-authored-by: GitHub Copilot

…decode path against None key_cache During V1 warmup with LoRA or KV-offloading, the decode path can be called before KV caches are bound. flat_pa crashes with AttributeError on key_cache.shape when key_cache is None. Add a None check in the decode path of HPUAttentionImpl.forward to return zeros when key_cache is not available, matching the defensive pattern already used in the prompt path. Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com> Co-authored-by: GitHub Copilot

…_cache access after upstream list-to-element refactor Upstream vLLM commit c59a132f9 (#37487) changed Attention.kv_cache from a list of tensors to a single tensor. The HPU attention and MLA attention code accessed self.kv_cache[0] which now returns the first sub-tensor slice instead of the intended KV cache tensor, causing corrupted inference results. Fix: Replace self.kv_cache[0] with self.kv_cache in both affected files. Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com>

…_cache indexing in Qwen3.5 GatedDeltaNet self.kv_cache is already a tuple (conv_state, ssm_state) assigned by the HPU model runner. The redundant intermediate index self.kv_cache[0][0/1] collapsed conv_state from 3-D to 2-D, causing an IndexError during Dynamo tracing. Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

github-actions · 2026-03-30T18:03:27Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
14acf429ac08b6d538ca6feb3e06b6d13895804d

…] Fix CPUOffloadingSpec import path and remove obsolete roberta patch (#1229)" This reverts commit 0fffded.

pawel-olejniczak requested review from adobrzyn, afierka-intel, iboiko-habana, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners March 24, 2026 11:17

Copilot AI review requested due to automatic review settings March 24, 2026 11:17

pawel-olejniczak requested review from PatrykWo, kamil-kaczor and ksmusz as code owners March 24, 2026 11:17

Copilot AI reviewed Mar 24, 2026

View reviewed changes

Comment thread vllm_gaudi/v1/kv_offload/worker/cpu_hpu.py

Comment thread vllm_gaudi/models/roberta.py Outdated

Copilot started reviewing on behalf of pawel-olejniczak March 24, 2026 11:34 View session

pawel-olejniczak force-pushed the fix/vllm-hourly-cpu-offload-roberta branch from 248cbdd to 7377dd7 Compare March 24, 2026 12:03

pawel-olejniczak changed the title ~~[FIX_FOR_VLLM_LATEST] Fix CPUOffloadingSpec import path and remove obsolete roberta patch~~ [FIX_FOR_VLLM_CUSTOM=14acf429ac08b6d538ca6feb3e06b6d13895804d] Fix CPUOffloadingSpec import path and remove obsolete roberta patch Mar 24, 2026

iboiko-habana reviewed Mar 24, 2026

View reviewed changes

github-actions Bot mentioned this pull request Mar 24, 2026

🚦 Team Review Dashboard #701

Open

pawel-olejniczak force-pushed the fix/vllm-hourly-cpu-offload-roberta branch 7 times, most recently from 12464ac to 016dcc3 Compare March 30, 2026 07:33

pawel-olejniczak added 4 commits March 30, 2026 15:13

pawel-olejniczak force-pushed the fix/vllm-hourly-cpu-offload-roberta branch from 016dcc3 to ea814d0 Compare March 30, 2026 12:14

pawel-olejniczak force-pushed the fix/vllm-hourly-cpu-offload-roberta branch from ea814d0 to 963be20 Compare March 30, 2026 12:16

iboiko-habana approved these changes Mar 30, 2026

View reviewed changes

iboiko-habana merged commit 0fffded into vllm-project:main Mar 30, 2026
78 of 90 checks passed

adobrzyn added a commit that referenced this pull request Apr 1, 2026

Revert "[FIX_FOR_VLLM_CUSTOM=14acf429ac08b6d538ca6feb3e06b6d13895804d…

6e290af

…] Fix CPUOffloadingSpec import path and remove obsolete roberta patch (#1229)" This reverts commit 0fffded.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX_FOR_VLLM_CUSTOM=14acf429ac08b6d538ca6feb3e06b6d13895804d] Fix CPUOffloadingSpec import path and remove obsolete roberta patch#1229

[FIX_FOR_VLLM_CUSTOM=14acf429ac08b6d538ca6feb3e06b6d13895804d] Fix CPUOffloadingSpec import path and remove obsolete roberta patch#1229
iboiko-habana merged 5 commits intovllm-project:mainfrom
pawel-olejniczak:fix/vllm-hourly-cpu-offload-roberta

pawel-olejniczak commented Mar 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

iboiko-habana Mar 24, 2026

Uh oh!

pawel-olejniczak Mar 24, 2026

Uh oh!

github-actions Bot commented Mar 25, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pawel-olejniczak commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Impact

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

iboiko-habana Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

pawel-olejniczak Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 25, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented Mar 30, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pawel-olejniczak commented Mar 24, 2026 •

edited

Loading