Merged
Conversation
Collaborator
xuechendi
commented
Jul 3, 2025
- only left basic test to accelerate UT
- add try exception in CI, previously, for some reason, failed test stucked at release resource for very long. Fix it here
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
ad52fee to
4d961c1
Compare
iboiko-habana
pushed a commit
that referenced
this pull request
Mar 30, 2026
…UOffloadingSpec import path and remove obsolete roberta patch (#1229) ## Summary Multiple upstream vLLM changes broke the hourly CI (RED since 2026-03-23 13:07 UTC): 1. **CPUOffloadingSpec import path** — upstream PR #37874 refactored cpu.py into a cpu/ package 2. **replace_roberta_positions removed** — upstream PR #37884 3. **vllm_is_batch_invariant removed** — replaced with envs.VLLM_BATCH_INVARIANT 4. **key_cache guard for None** — upstream decode path can pass None 5. **Synapse SDPA error 400** — continuation prefills triggered Synapse errors 6. **Attention.kv_cache list-to-element refactor** — upstream PR #37487 (c59a132f9) changed Attention.kv_cache from list to tensor. HPU code used self.kv_cache[0] producing garbage output. ## Changes - cpu_hpu.py: Updated CPUOffloadingSpec import path - models/roberta.py: Removed obsolete monkey-patch - __init__.py: Removed roberta import - vllm_gaudi_batch_invariant.py: Replace vllm_is_batch_invariant with envs.VLLM_BATCH_INVARIANT - ops/hpu_paged_attn.py: Guard decode path against None key_cache - attention/hpu_attn.py: Fix SDPA padding for continuation prefills - **ops/hpu_attention.py**: self.kv_cache[0] -> self.kv_cache (fix #6) - **attention/oot_mla.py**: self.kv_cache[0] -> self.kv_cache (fix #6) ## HPU Verification (Gaudi 3, HL-325) **kv_cache fix A/B test:** - WITHOUT fix (self.kv_cache[0]): garbage output - WITH fix (self.kv_cache): correct coherent output **Remaining test issues (NOT caused by this PR):** - test_cpu_offloading: CPU offloading perf issue on HPU (separate) - test_llama_lora: 2/4 SQL outputs mismatch (was garbage without fix, now valid SQL) ## Impact Fixes ALL 50+ e2e test failures and restores correct model output on HPU. --- *AI-assisted: All changes reviewed and verified on HPU hardware.* --------- Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.