Skip to content

Fix CI fail hang#6

Merged
xuechendi merged 1 commit intomainfrom
fix_ci_fail_hang
Jul 3, 2025
Merged

Fix CI fail hang#6
xuechendi merged 1 commit intomainfrom
fix_ci_fail_hang

Conversation

@xuechendi
Copy link
Copy Markdown
Collaborator

  1. only left basic test to accelerate UT
  2. add try exception in CI, previously, for some reason, failed test stucked at release resource for very long. Fix it here

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
@xuechendi xuechendi merged commit f75ff7b into main Jul 3, 2025
1 check passed
@kzawora-intel kzawora-intel deleted the fix_ci_fail_hang branch July 10, 2025 15:48
iboiko-habana pushed a commit that referenced this pull request Mar 30, 2026
…UOffloadingSpec import path and remove obsolete roberta patch (#1229)

## Summary

Multiple upstream vLLM changes broke the hourly CI (RED since 2026-03-23
13:07 UTC):

1. **CPUOffloadingSpec import path** — upstream PR #37874 refactored
cpu.py into a cpu/ package
2. **replace_roberta_positions removed** — upstream PR #37884
3. **vllm_is_batch_invariant removed** — replaced with
envs.VLLM_BATCH_INVARIANT
4. **key_cache guard for None** — upstream decode path can pass None
5. **Synapse SDPA error 400** — continuation prefills triggered Synapse
errors
6. **Attention.kv_cache list-to-element refactor** — upstream PR #37487
(c59a132f9) changed Attention.kv_cache from list to tensor. HPU code
used self.kv_cache[0] producing garbage output.

## Changes

- cpu_hpu.py: Updated CPUOffloadingSpec import path
- models/roberta.py: Removed obsolete monkey-patch
- __init__.py: Removed roberta import
- vllm_gaudi_batch_invariant.py: Replace vllm_is_batch_invariant with
envs.VLLM_BATCH_INVARIANT
- ops/hpu_paged_attn.py: Guard decode path against None key_cache
- attention/hpu_attn.py: Fix SDPA padding for continuation prefills
- **ops/hpu_attention.py**: self.kv_cache[0] -> self.kv_cache (fix #6)
- **attention/oot_mla.py**: self.kv_cache[0] -> self.kv_cache (fix #6)

## HPU Verification (Gaudi 3, HL-325)

**kv_cache fix A/B test:**
- WITHOUT fix (self.kv_cache[0]): garbage output
- WITH fix (self.kv_cache): correct coherent output

**Remaining test issues (NOT caused by this PR):**
- test_cpu_offloading: CPU offloading perf issue on HPU (separate)
- test_llama_lora: 2/4 SQL outputs mismatch (was garbage without fix,
now valid SQL)

## Impact

Fixes ALL 50+ e2e test failures and restores correct model output on
HPU.

---
*AI-assisted: All changes reviewed and verified on HPU hardware.*

---------

Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com>
Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant