[FIX_FOR_VLLM_CUSTOM=fc701c80588c215f84af0b745edcf4d127e276bc] Fix upstream regressions in HPU worker, MoE router, and offloading tests by pawel-olejniczak · Pull Request #1354 · vllm-project/vllm-gaudi

pawel-olejniczak · 2026-04-15T12:53:12Z

Fix three upstream regressions that break HPU unit tests.

Changes

vllm_gaudi/v1/worker/hpu_worker.py — compile_or_warm_up_model() now returns
CompilationTimes NamedTuple instead of a plain float, matching the new upstream
contract introduced in Measure encoder compile time seperate from llm backbone vllm#39240.
vllm_gaudi/ops/hpu_fused_moe.py — Add zero_expert_type and num_logical_experts
parameters to the HPU override of create_fused_moe_router(), plus ZeroExpertRouter
dispatch, matching the refactor in [MoE Refactor] Refactor ZeroExpertFusedMoE into new framework vllm#35549.
tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py — Remove
block_size from OffloadingEvent constructor calls and update assertion, matching
the field removal in [kv_offload+HMA][3/N]: Remove block_size from KVEvents vllm#36644.

Fixed tests

tests/unit_tests/lora/test_llama_tp.py::test_llama_lora
tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests
tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]
tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method
tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method
tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method
tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]
tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

…router Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

Copilot

Pull request overview

Fixes upstream-compat regressions in the Gaudi vLLM plugin by aligning (1) the HPU worker warmup/compile return type with the new vLLM CompilationTimes contract and (2) the HPU fused-MoE router factory signature/dispatch with upstream’s zero-expert routing refactor.

Changes:

Update HPUWorker.compile_or_warm_up_model() to return a CompilationTimes NamedTuple (language_model + encoder) instead of a float.
Extend create_fused_moe_router() override to accept zero_expert_type / num_logical_experts and dispatch to ZeroExpertRouter when configured.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`vllm_gaudi/v1/worker/hpu_worker.py`	Align worker warmup/compile return value with upstream `CompilationTimes` interface.
`vllm_gaudi/ops/hpu_fused_moe.py`	Add zero-expert router parameters and routing selection to match upstream MoE router factory changes.

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

github-actions · 2026-04-16T01:43:10Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
fc701c80588c215f84af0b745edcf4d127e276bc

…stream regressions in HPU worker, MoE router, and offloading tests (vllm-project#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>

…stream regressions in HPU worker, MoE router, and offloading tests (vllm-project#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>

pawel-olejniczak added 2 commits April 15, 2026 15:13

Return CompilationTimes from HPU worker compile_or_warm_up_model

1ccccf5

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

Add zero_expert_type and num_logical_experts to HPU create_fused_moe_…

3006cda

…router Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

pawel-olejniczak requested a review from xuechendi as a code owner April 15, 2026 12:53

Copilot AI review requested due to automatic review settings April 15, 2026 12:53

pawel-olejniczak requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, mgawarkiewicz-intel and michalkuligowski as code owners April 15, 2026 12:53

Copilot started reviewing on behalf of pawel-olejniczak April 15, 2026 12:53 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread vllm_gaudi/ops/hpu_fused_moe.py

Comment thread vllm_gaudi/ops/hpu_fused_moe.py

github-actions Bot mentioned this pull request Apr 15, 2026

🚦 Team Review Dashboard #701

Open

Remove block_size from OffloadingEvent in test_scheduler

ae652eb

Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

adobrzyn approved these changes Apr 16, 2026

View reviewed changes

adobrzyn merged commit c8958e6 into vllm-project:main Apr 16, 2026
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX_FOR_VLLM_CUSTOM=fc701c80588c215f84af0b745edcf4d127e276bc] Fix upstream regressions in HPU worker, MoE router, and offloading tests#1354

[FIX_FOR_VLLM_CUSTOM=fc701c80588c215f84af0b745edcf4d127e276bc] Fix upstream regressions in HPU worker, MoE router, and offloading tests#1354
adobrzyn merged 3 commits into
vllm-project:mainfrom
pawel-olejniczak:fix/vllm-hourly-15-4

pawel-olejniczak commented Apr 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pawel-olejniczak commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Fixed tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pawel-olejniczak commented Apr 15, 2026 •

edited

Loading