[FIX_FOR_VLLM_CUSTOM=fc701c80588c215f84af0b745edcf4d127e276bc] Fix upstream regressions in HPU worker, MoE router, and offloading tests#1354
Merged
adobrzyn merged 3 commits intoApr 16, 2026
Conversation
Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
…router Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes upstream-compat regressions in the Gaudi vLLM plugin by aligning (1) the HPU worker warmup/compile return type with the new vLLM CompilationTimes contract and (2) the HPU fused-MoE router factory signature/dispatch with upstream’s zero-expert routing refactor.
Changes:
- Update
HPUWorker.compile_or_warm_up_model()to return aCompilationTimesNamedTuple (language_model + encoder) instead of a float. - Extend
create_fused_moe_router()override to acceptzero_expert_type/num_logical_expertsand dispatch toZeroExpertRouterwhen configured.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
vllm_gaudi/v1/worker/hpu_worker.py |
Align worker warmup/compile return value with upstream CompilationTimes interface. |
vllm_gaudi/ops/hpu_fused_moe.py |
Add zero-expert router parameters and routing selection to match upstream MoE router factory changes. |
Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>
✅ CI PassedAll checks passed successfully against the following vllm commit: |
adobrzyn
approved these changes
Apr 16, 2026
yeonsily
pushed a commit
to yeonsily/vllm-gaudi
that referenced
this pull request
Apr 21, 2026
…stream regressions in HPU worker, MoE router, and offloading tests (vllm-project#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Yeonsil Yoon <yeon.sil.yoon@intel.com>
bmyrcha
pushed a commit
to bmyrcha/vllm-gaudi
that referenced
this pull request
Apr 22, 2026
…stream regressions in HPU worker, MoE router, and offloading tests (vllm-project#1354) Fix three upstream regressions that break HPU unit tests. ## Changes 1. **`vllm_gaudi/v1/worker/hpu_worker.py`** — `compile_or_warm_up_model()` now returns `CompilationTimes` NamedTuple instead of a plain `float`, matching the new upstream contract introduced in vllm-project/vllm#39240. 2. **`vllm_gaudi/ops/hpu_fused_moe.py`** — Add `zero_expert_type` and `num_logical_experts` parameters to the HPU override of `create_fused_moe_router()`, plus `ZeroExpertRouter` dispatch, matching the refactor in vllm-project/vllm#35549. 3. **`tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py`** — Remove `block_size` from `OffloadingEvent` constructor calls and update assertion, matching the field removal in vllm-project/vllm#36644. ## Fixed tests - `tests/unit_tests/lora/test_llama_tp.py::test_llama_lora` - `tests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requests` - `tests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]` - `tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_method` - `tests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_method` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]` - `tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]` --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: bmyrcha <bartosz.myrcha@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix three upstream regressions that break HPU unit tests.
Changes
vllm_gaudi/v1/worker/hpu_worker.py—compile_or_warm_up_model()now returnsCompilationTimesNamedTuple instead of a plainfloat, matching the new upstreamcontract introduced in Measure encoder compile time seperate from llm backbone vllm#39240.
vllm_gaudi/ops/hpu_fused_moe.py— Addzero_expert_typeandnum_logical_expertsparameters to the HPU override of
create_fused_moe_router(), plusZeroExpertRouterdispatch, matching the refactor in [MoE Refactor] Refactor ZeroExpertFusedMoE into new framework vllm#35549.
tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py— Removeblock_sizefromOffloadingEventconstructor calls and update assertion, matchingthe field removal in [kv_offload+HMA][3/N]: Remove block_size from KVEvents vllm#36644.
Fixed tests
tests/unit_tests/lora/test_llama_tp.py::test_llama_loratests/unit_tests/lora/test_llm_with_multi_loras.py::test_multiple_lora_requeststests/unit_tests/test_embedding.py::test_embeddings[intfloat/e5-mistral-7b-instruct]tests/unit_tests/ops/test_hpu_fused_moe.py::test_unquantized_fused_moe_methodtests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_wna16_moe_methodtests/unit_tests/ops/test_hpu_compressed_tensors.py::test_compressed_tensors_w8a8fp8_block_moe_methodtests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[True]tests/unit_tests/kv_offload/offloading_connector/test_scheduler.py::test_offloading_connector[False]