[FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring by pawel-olejniczak · Pull Request #1436 · vllm-project/vllm-gaudi

pawel-olejniczak · 2026-05-11T18:04:07Z

Fix upstream regressions affecting hourly CI:

MultiModelEngineClient: Added missing notify_kv_transfer_request_rejected abstract method (upstream PR [Bugfix][KV Transfer][NIXL] Notify P node on pre-admission rejection to free stranded KV blocks vllm#41269)
Qwen3.5 test harness: Updated test_common.py to read enforce_eager from model card config (with env var override), enabling per-model compilation control
EPLB refactoring: Removed EMPTY_EPLB_STATE import and enable_eplb parameter from patched_create_fused_moe_router after upstream MoE refactor (upstream PR [MoE Refactor] EPLB refactoring for FusedMoE vllm#41055)

Note: The enforce_eager: true workaround for Qwen3.5 compilation has been removed — the root cause (mamba_type str-vs-Enum comparison in hybrid cache allocation) is properly fixed by #1449, which should merge first.

Verified on HPU: unit tests pass on Gaudi 3 (MoE, FP8, compressed tensors).

github-actions · 2026-05-11T18:05:17Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

Copilot

Pull request overview

This PR aims to restore compatibility with upstream vLLM changes and to avoid an HPU torch.compile failure for the Qwen3.5-35B-A3B evaluation path by forcing eager execution.

Changes:

Implement EngineClient.notify_kv_transfer_request_rejected() in MultiModelEngineClient by delegating to the underlying engine.
Update the Qwen3.5-35B-A3B full-test model card to set enforce_eager: true (intended to bypass graph compilation).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`vllm_gaudi/entrypoints/openai/multi_model_api_server.py`	Adds delegation method to satisfy upstream `EngineClient` abstract API and includes minor formatting-only adjustments.
`tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml`	Adds `enforce_eager: true` to attempt to disable compile/graph capture for this model’s full-test run.

github-actions · 2026-05-12T17:09:53Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

…ltiModelEngineClient abstract method and Qwen3.5 compilation - Add notify_kv_transfer_request_rejected() delegation to MultiModelEngineClient (upstream PR #41269 added new abstract method to EngineClient) - Set enforce_eager=true for Qwen3.5-35B-A3B model card to work around aot_autograd view mutation assertion that fires during HPU graph compilation (upstream compilation changes between vLLM 8eb40113 and 9efdddca trigger incompatibility with HPU's monkey-patched attention) Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

The enforce_eager parameter was only read from the ENFORCE_EAGER environment variable, ignoring the value set in model card YAML files. This caused the Qwen3.5-35B-A3B test to fail with a BackendCompilerFailed error on HPU because torch.compile was not disabled despite enforce_eager: true being set in the model card. Read enforce_eager from eval_config (model card) first, with env var as override — consistent with how trust_remote_code, dtype, and other model config fields are handled. Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

… EMPTY_EPLB_STATE import and enable_eplb parameter after upstream EPLB refactor Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

Proper fix for Qwen3.5 compilation (mamba_type Enum comparison) is in PR vllm-project#1449. The enforce_eager workaround causes performance degradation and is unnecessary once vllm-project#1449 merges. Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

github-actions · 2026-05-18T16:50:16Z

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

github-actions · 2026-05-19T10:43:38Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
dcacdf9a8860a86401127d1c8f93ebf3cfbfd026

Copilot AI review requested due to automatic review settings May 11, 2026 18:04

pawel-olejniczak requested review from PatrykWo, adobrzyn, afierka-intel, iboiko-habana, jbyczkow, kamil-kaczor, ksmusz, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners May 11, 2026 18:04

Copilot started reviewing on behalf of pawel-olejniczak May 11, 2026 18:04 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

Comment thread tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml

github-actions Bot mentioned this pull request May 11, 2026

🚦 Team Review Dashboard #701

Open

pawel-olejniczak force-pushed the fix/vllm-hourly-ci-fixes branch 3 times, most recently from 4a53ec6 to 81b2daa Compare May 12, 2026 13:50

pawel-olejniczak changed the title ~~[FIX_FOR_VLLM_CUSTOM=9efdddca283cc0eb7c37fa49c4a9d1c9bf59ec4e] Fix MultiModelEngineClient abstract method and Qwen3.5 compilation~~ [FIX_FOR_VLLM_LATEST] Fix MultiModelEngineClient abstract method and Qwen3.5 compilation May 12, 2026

pawel-olejniczak changed the title ~~[FIX_FOR_VLLM_LATEST] Fix MultiModelEngineClient abstract method and Qwen3.5 compilation~~ [FIX_FOR_VLLM_CUSTOM=9efdddca283cc0eb7c37fa49c4a9d1c9bf59ec4e] Fix MultiModelEngineClient abstract method and Qwen3.5 compilation May 12, 2026

pawel-olejniczak force-pushed the fix/vllm-hourly-ci-fixes branch 2 times, most recently from c355fb7 to 6d45ba9 Compare May 13, 2026 09:02

pawel-olejniczak added 2 commits May 13, 2026 16:18

pawel-olejniczak force-pushed the fix/vllm-hourly-ci-fixes branch from 6d45ba9 to 4488cec Compare May 13, 2026 16:59

pawel-olejniczak changed the title ~~[FIX_FOR_VLLM_CUSTOM=9efdddca283cc0eb7c37fa49c4a9d1c9bf59ec4e] Fix MultiModelEngineClient abstract method and Qwen3.5 compilation~~ [FIX_FOR_VLLM_LATEST] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring May 13, 2026

pawel-olejniczak changed the title ~~[FIX_FOR_VLLM_LATEST] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring~~ [FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring May 13, 2026

pawel-olejniczak force-pushed the fix/vllm-hourly-ci-fixes branch from 4488cec to 35b43be Compare May 13, 2026 19:00

[FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Remove…

b8231a4

… EMPTY_EPLB_STATE import and enable_eplb parameter after upstream EPLB refactor Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com>

pawel-olejniczak force-pushed the fix/vllm-hourly-ci-fixes branch from 35b43be to b8231a4 Compare May 14, 2026 08:13

Merge branch 'main' into fix/vllm-hourly-ci-fixes

1ebd6ae

shepark reviewed May 14, 2026

View reviewed changes

Comment thread tests/full_tests/model_cards/qwen3.5-35b-a3b.yaml Outdated

pawel-olejniczak and others added 4 commits May 15, 2026 11:00

Merge branch 'main' into fix/vllm-hourly-ci-fixes

0214a16

Merge branch 'main' into fix/vllm-hourly-ci-fixes

f16b3d1

Merge branch 'main' into fix/vllm-hourly-ci-fixes

af379ee

pawel-olejniczak added 2 commits May 18, 2026 21:03

Merge branch 'main' into fix/vllm-hourly-ci-fixes

f6c866c

Merge branch 'main' into fix/vllm-hourly-ci-fixes

2b2e410

iboiko-habana approved these changes May 19, 2026

View reviewed changes

iboiko-habana merged commit c0a59cf into vllm-project:main May 19, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring#1436

[FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring#1436
iboiko-habana merged 10 commits into
vllm-project:mainfrom
pawel-olejniczak:fix/vllm-hourly-ci-fixes

pawel-olejniczak commented May 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pawel-olejniczak commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 11, 2026

🚧 CI Blocked

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions Bot commented May 12, 2026

🚧 CI Blocked

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

🚧 CI Blocked

Uh oh!

github-actions Bot commented May 19, 2026

✅ CI Passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pawel-olejniczak commented May 11, 2026 •

edited

Loading