Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
7a4d2fe
Mrope accuracy fix for qwen (#1437)
hsubramony May 15, 2026
f1abfec
Fix for MoE refactor #35178 (#1442)
iboiko-habana May 15, 2026
9566f70
fix: HPU-specific bug fixes for KV-offload + async spec-decode (#1264…
hsubramony May 18, 2026
397f562
[DOC] Fix torchaudio version (#1425)
yangulei May 18, 2026
252970e
Harden Qwen3.5 CI test to detect regressions (#1443)
shepark May 18, 2026
e9b8f08
Fix decode bucket filter issues from #1122 (#1447)
yangulei May 18, 2026
e5b23b2
Fix mamba_type comparison for GDN hybrid cache allococation (#1449)
shepark May 18, 2026
27c367b
fix: replace batched_count_greater_than to avoid dynamic shape TypeEr…
kamil-kaczor May 18, 2026
4d6d38c
fix: bypass _forward_impl for dp_size==1 to fix DeepSeek R1 FP8 crash…
kamil-kaczor May 18, 2026
fc43efa
Remove num_ctx_tokens_less_or_equal_batched_max_model_len filter (#1454)
yangulei May 18, 2026
a331930
fix kernel block size, port of #1439 (#1453)
iboiko-habana May 19, 2026
0e58506
fix: hybrid model warmup block_size mismatch (Qwen3.5-35B-A3B) (#1434)
adobrzyn May 19, 2026
d999b2e
Add Qwen3NextForCausalLM to mamba_like_arch (#1450)
rsmyrek May 19, 2026
c0a59cf
[FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Fix Mu…
pawel-olejniczak May 19, 2026
8c5008e
Fix patch_hf3fs_mock_client_for_cpu_only (#1439)
hsubramony May 19, 2026
a4150f5
Increase timeout from default 6h to 12h (#1464)
bmyrcha May 19, 2026
56474c1
Removal of ray and redundant transformers packages from gaudi require…
iboiko-habana May 20, 2026
bb9ca73
[FIX_FOR_VLLM_CUSTOM=a78b842d0e85d287176031334f4721cd96b6e47d] Fix of…
pawel-olejniczak May 21, 2026
dc459b8
Add pre-merge-approval for execute_pre_merge (#1471)
bmyrcha May 21, 2026
ca2d952
ci: route HF_TOKEN-using jobs through approved-workflow environment (…
adobrzyn May 21, 2026
7b7bc8f
[FIX_FOR_VLLM_CUSTOM=0a54df28471be07b3d668ea21c5e411569d3baea] Fix Dy…
pawel-olejniczak May 22, 2026
2cb5d99
Fix stale gate ref overriding caller router_logits in dp_size==1 MoE …
iboiko-habana May 22, 2026
adce75b
Update lora tests (#1488)
iboiko-habana May 25, 2026
87aef6c
Fix HPU prompt_token_ids device placement for penalty sampling (#1465)
yeonsily May 25, 2026
bc4f535
Fix decode bucket generation for hybrid models with mismatched block …
yangulei May 27, 2026
bf8dfdf
[FIX_FOR_VLLM_CUSTOM=b06813e87207e15b133e903d641e03f237d85b17] Fix gd…
pawel-olejniczak May 27, 2026
d8af506
Revert "Skip materialised causal attn_bias on FSDPA for non-GDN hybri…
rsmyrek May 27, 2026
78b3b3d
Fix accuracy issue in minimax_m2 with TP > 1
skavulya May 16, 2026
7f2a309
Fix lint format issues
skavulya May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/create-release-branch.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ jobs:
needs: [prepare-release-branch, setup_and_build, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run pytest in tests/unit_tests
run: |
Expand Down Expand Up @@ -216,6 +217,7 @@ jobs:
needs: [prepare-release-branch, setup_and_build, discover_tests, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -248,6 +250,7 @@ jobs:
needs: [prepare-release-branch, setup_and_build, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run Data Parallel test
run: |
Expand Down Expand Up @@ -275,6 +278,7 @@ jobs:
needs: [prepare-release-branch, setup_and_build, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run PD disaggregate test
run: |
Expand Down Expand Up @@ -305,6 +309,7 @@ jobs:
needs: [prepare-release-branch, setup_and_build, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run Sharegpt performance tests with warmup
run: |
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/hourly-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ jobs:
needs: [setup_and_build, discover_runner]
# <-- UPDATED: Runs on the specific runner
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run pytest in tests/unit_tests
run: |
Expand Down Expand Up @@ -157,6 +158,7 @@ jobs:
needs: [setup_and_build, discover_tests, discover_runner]
# <-- UPDATED: Runs on the specific runner
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -192,6 +194,7 @@ jobs:
needs: [setup_and_build, discover_runner]
# <-- UPDATED: Runs on the specific runner
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run Data Parallel test
run: |
Expand Down Expand Up @@ -220,6 +223,7 @@ jobs:
needs: [setup_and_build, discover_runner]
# <-- UPDATED: Runs on the specific runner
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
steps:
- name: Run PD disaggregate test
run: |
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/pre-merge-trigger.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,14 @@ concurrency:
cancel-in-progress: true

jobs:
gate:
runs-on: ubuntu-latest
environment: pre-merge-approval
steps:
- run: echo "Approved"
execute_pre_merge:
runs-on: ubuntu-latest
needs: gate
timeout-minutes: 720
permissions:
actions: write # dispatch workflows, read run status, cancel orphaned runs
Expand Down
25 changes: 25 additions & 0 deletions .github/workflows/pre-merge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ concurrency:
jobs:
retrieve_head_sha:
runs-on: ubuntu-latest
timeout-minutes: 720
outputs:
head_sha: ${{ steps.set_sha.outputs.head_sha }}
steps:
Expand All @@ -40,6 +41,7 @@ jobs:
gatekeeper:
needs: retrieve_head_sha
runs-on: ubuntu-latest
timeout-minutes: 720
permissions:
# Required to read the status of checks and PR details
checks: read
Expand Down Expand Up @@ -136,6 +138,7 @@ jobs:
discover_runner:
needs: gatekeeper
runs-on: ${{ inputs.use_hourly_runner == 'true' && 'hourly-ci' || 'pr-ci' }}
timeout-minutes: 720
outputs:
runner_name: ${{ steps.get_name.outputs.name }}
steps:
Expand All @@ -150,6 +153,7 @@ jobs:
needs: [discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
Expand Down Expand Up @@ -180,6 +184,7 @@ jobs:
discover_calibration_tests:
needs: [discover_runner, retrieve_head_sha]
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
Expand Down Expand Up @@ -207,6 +212,7 @@ jobs:
# This job runs in parallel with the build job
needs: [gatekeeper, retrieve_head_sha]
runs-on: ubuntu-latest
timeout-minutes: 720
steps:
- name: Checkout repository
uses: actions/checkout@v4
Expand Down Expand Up @@ -235,6 +241,7 @@ jobs:
if: inputs.skip_tests != 'true'
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
permissions:
contents: read # Required to checkout code and read history
outputs:
Expand Down Expand Up @@ -354,6 +361,8 @@ jobs:
needs: [pre_merge_hpu_test_build, discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
steps:
- name: Run pytest in tests/unit_tests
run: |
Expand All @@ -378,6 +387,8 @@ jobs:
needs: [pre_merge_hpu_test_build, hpu_unit_tests, discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
steps:
- name: Run test scripts
run: |
Expand Down Expand Up @@ -408,6 +419,8 @@ jobs:
needs: [pre_merge_hpu_test_build, hpu_unit_tests, discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
steps:
- name: Run test scripts
run: |
Expand All @@ -433,6 +446,8 @@ jobs:
needs: [pre_merge_hpu_test_build, hpu_unit_tests, discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
steps:
- name: Run test scripts
run: |
Expand All @@ -459,6 +474,8 @@ jobs:
needs: [pre_merge_hpu_test_build, hpu_unit_tests, discover_tests, discover_runner, retrieve_head_sha]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -491,6 +508,8 @@ jobs:
calibration_tests:
needs: [pre_merge_hpu_test_build, hpu_unit_tests, discover_calibration_tests, discover_runner, retrieve_head_sha]
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
environment: approved-workflow
timeout-minutes: 720
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -522,6 +541,7 @@ jobs:
calibration_arg_parsing_tests:
needs: [pre_merge_hpu_test_build, discover_runner, retrieve_head_sha]
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
steps:
- name: Run calibration arg parsing tests
run: |
Expand All @@ -544,6 +564,7 @@ jobs:
needs: [retrieve_head_sha]
if: inputs.is_merge_group != 'true'
runs-on: ubuntu-latest
timeout-minutes: 720
outputs:
nixl_changed: ${{ steps.check.outputs.nixl_changed }}
steps:
Expand Down Expand Up @@ -571,6 +592,7 @@ jobs:
needs: [check_dockerfile_changes, discover_runner, retrieve_head_sha]
if: needs.check_dockerfile_changes.outputs.nixl_changed == 'true'
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
steps:
- name: Checkout repository
uses: actions/checkout@v4
Expand All @@ -595,6 +617,7 @@ jobs:
needs: [hpu_unit_tests, e2e, hpu_perf_tests, calibration_tests, calibration_arg_parsing_tests, discover_runner]
# --- UPDATED: Run on the specific node ---
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
# This job is required to pass for pre-merge CI. By itself it does nothing, and will only pass if all jobs specified in "needs" list pass.
steps:
- name: Succeeded if all previous jobs passed
Expand All @@ -605,6 +628,7 @@ jobs:
# This job runs after hpu-test-suite completes
needs: [pre_merge_hpu_test, pre_merge_hpu_test_build]
runs-on: ubuntu-latest
timeout-minutes: 720
permissions:
# Permissions are required on a per-job basis
pull-requests: write
Expand All @@ -624,6 +648,7 @@ jobs:
if: always()
needs: [discover_runner, hpu_unit_tests, hpu_pd_tests, hpu_perf_tests, hpu_dp_tests, e2e, calibration_tests, calibration_arg_parsing_tests]
runs-on: ${{ needs.discover_runner.outputs.runner_name }}
timeout-minutes: 720
steps:
- name: Remove Docker image to free up space
env:
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ The vLLM Hardware Plugin for Intel® Gaudi® integrates [Intel® Gaudi® AI acce
5. Install torchaudio (required by some upstream vLLM models such as QWEN3_5). Use the CPU wheel with `--no-deps` to avoid pulling a conflicting CUDA torch:

```bash
pip install --no-deps torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
TORCH_VERSION=$(python3 -c "import re, torch; print(re.match(r'(\d+\.\d+\.\d+)', torch.__version__).group(1))")
pip install --no-deps torchaudio==$TORCH_VERSION --extra-index-url https://download.pytorch.org/whl/cpu
```

To see all the available installation methods, such as NIXL, see the [Installation](https://vllm-gaudi.readthedocs.io/en/latest/getting_started/installation.html) guide.
Expand Down
2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
# Dependencies for HPU code
ray>=2.48.0
pandas>=2.2.3
numba>=0.58.0
numpy>=1.26.0
transformers >= 4.56.0, != 5.0.*, != 5.1.*, != 5.2.*, != 5.3.*, != 5.4.*, != 5.5.0, != 5.6.*
kaldi-native-fbank >= 1.18.7
decord >= 0.6.0
tblib==3.1.0
1 change: 1 addition & 0 deletions tests/full_tests/ci_e2e_discoverable_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,7 @@ run_longbench_qwen3_30b_fp8_static_fp8_fsdpa_slicing_compile_test() {
run_gsm8k_qwen35_35b_a3b_test() {
echo "➡️ Testing GSM8K on Qwen3.5-35B-A3B..."
VLLM_SKIP_WARMUP=True ENABLE_APC=False VLLM_FUSED_BLOCK_SOFTMAX_ADJUSTMENT=False VLLM_GRAPH_RESERVED_MEM=0.8 \
VLLM_PROMPT_BS_BUCKET_MAX=32 \
pytest -v -s "${VLLM_GAUDI_PREFIX}/tests/models/language/generation/test_common.py" --model_card_path "${VLLM_GAUDI_PREFIX}/tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml"
echo "✅ Test with Qwen3.5-35B-A3B passed."
}
Expand Down
2 changes: 1 addition & 1 deletion tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ model_card:

metrics:
name: exact_match,strict-match
value: 0.75
value: 0.9
Loading
Loading