fix(xpu): Re-compute compile ranges after platform-specific config updates by Liangyx2 · Pull Request #37523 · vllm-project/vllm

Liangyx2 · 2026-03-19T06:47:48Z

Summary

Fix compile range computation order to respect platform-specific scheduler config updates.
This ensures torch.compile warmup uses sizes that are valid for the actual max_num_batched_tokens,
particularly on XPU backend where MLA models have constraints that reduce this value.

Issue

When using torch.compile mode on XPU backend with MLA-enabled models:

Initial _set_compile_ranges() computes endpoints based on default max_num_batched_tokens (e.g., 8192)
check_and_update_config() for XPU detects MLA and lowers max_num_batched_tokens (e.g., to 4096)
During warmup, compile attempts to execute with size 8192 but max_num_tokens is now 4096
Assertion fails in _dummy_run(): assert num_tokens <= self.max_num_tokens

This is a configuration order bug - compile ranges should be finalized AFTER platform
config updates, not before.

Root Cause

In VllmConfig.__post_init__():

Line 989: _set_compile_ranges() called (uses original max_num_batched_tokens)
Line 1023: check_and_update_config() called (XPU may lower max_num_batched_tokens)
Compile ranges never updated to reflect the new limit

Fix

Move the second _set_compile_ranges() call to execute immediately after
check_and_update_config() to ensure compile ranges reflect all platform-specific
scheduler config updates.

Testing

Tested with Pinaster/GLM-5_4layer model (MLA enabled) on XPU
Compile mode now successfully initializes without AssertionError

Changes

vllm/config/vllm.py: Re-invoke _set_compile_ranges() after check_and_update_config()

Notes

This is a root-cause fix addressing the configuration order issue rather than
working around it in the warmup phase. It applies universally and prevents similar
issues on other platforms with custom scheduler config logic.

gemini-code-assist

Code Review

This pull request correctly addresses an AssertionError during model compilation warmup by filtering out warmup sizes that exceed the model runner's token capacity. The change ensures that _dummy_run is only called with valid sizes, preventing the crash. My feedback includes a suggestion to optimize the code by partitioning the warmup sizes in a single pass, which improves efficiency and readability.

vllm/v1/worker/gpu_worker.py

jikunshang · 2026-03-19T07:18:42Z

what's your invalid warmup sizes and max_num_tokens? I think this should never happen.

Liangyx2 · 2026-03-19T07:27:33Z

The XPU compile warmup runs one or more forward passes using a set of “preset/enumerated” token counts to trigger graph compilation; however, under certain configurations, these token counts may exceed the actual token capacity allowed by the current ModelRunner, making them “invalid warmup sizes.”

Skipping invalid compile warmup sizes [8192] because they exceed max_num_tokens=4096.

jikunshang · 2026-03-19T07:43:21Z

The XPU compile warmup runs one or more forward passes using a set of “preset/enumerated” token counts to trigger graph compilation; however, under certain configurations, these token counts may exceed the actual token capacity allowed by the current ModelRunner, making them “invalid warmup sizes.”

Skipping invalid compile warmup sizes [8192] because they exceed max_num_tokens=4096.

does cuda have same behavior? if not, there may be some xpu config setting error. we should fix that instead.

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Yuxiang Liang <yuliang@habana.ai>

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

…ation updates Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

mergify · 2026-03-19T08:21:58Z

Hi @Liangyx2, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

jikunshang · 2026-03-19T08:23:52Z

actually I am thinking we should remove this https://github.com/vllm-project/vllm/blob/main/vllm/platforms/xpu.py#L212-L221

jikunshang · 2026-03-19T08:26:27Z

and I plan to land this for MLA #37143 recently.

ProExpertProg

Looks good, please remove the old call though

vllm/config/vllm.py

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

mergify · 2026-03-19T09:49:00Z

Hi @Liangyx2, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

yewentao256

LGTM, thanks for the work! Please fix the pre-commit issue so that we can land this

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

…dates (vllm-project#37523) Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com> Signed-off-by: Yuxiang Liang <yuliang@habana.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

### What this PR does / why we need it? Main2main Upgrade vllm commit to 0320 17:00 1. fix vllm refactored `_moe_forward` to call `runner.forward_impl_chunked()` when `runner.use_dp_chunking` is True. vllm PR:"[MoE Refactor] DefaultMoERunner simplification [#33049](vllm-project/vllm#33049)" 2.fix vllm moved the call to `self._set_compile_ranges()` in `VllmConfig.__post_init__` from **before** `check_and_update_config()` to **after** it (to allow platforms to lower `max_num_batched_tokens` first). vllm PR: "fix(xpu): Re-compute compile ranges after platform-specific config updates" [#37523](vllm-project/vllm#37523) ### Does this PR introduce _any_ user-facing change? NA ### How was this patch tested? NA - vLLM version: v0.17.0 - vLLM main: vllm-project/vllm@8b63257 --------- Signed-off-by: leo-pony <nengjunma@outlook.com> Co-authored-by: Claude Code <noreply@anthropic.com>

Liangyx2 requested a review from njhill as a code owner March 19, 2026 06:47

mergify bot added the v1 label Mar 19, 2026

gemini-code-assist bot reviewed Mar 19, 2026

View reviewed changes

vllm/v1/worker/gpu_worker.py Outdated Show resolved Hide resolved

Liangyx2 force-pushed the VLLMZ-905 branch from 2a1d1a1 to cc9fd63 Compare March 19, 2026 07:30

Liangyx2 requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 19, 2026 08:07

Liangyx2 changed the title ~~[VLLMZ-905] fix(xpu): Clamp compile warmup sizes to model runner token capacity~~ [VLLMZ-905] fix(xpu): Re-compute compile ranges after platform-specific config updates Mar 19, 2026

Liangyx2 and others added 4 commits March 19, 2026 16:13

fix: Clamp compile warmup sizes to model runner token capacity

1f05c75

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Update vllm/v1/worker/gpu_worker.py

cbe6f14

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Yuxiang Liang <yuliang@habana.ai>

revert

f5c8397

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

fix (xpu): Re-compute compile ranges after platform-specific configur…

886049a

…ation updates Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

Liangyx2 force-pushed the VLLMZ-905 branch from c9169c8 to 886049a Compare March 19, 2026 08:13

ProExpertProg approved these changes Mar 19, 2026

View reviewed changes

vllm/config/vllm.py Outdated Show resolved Hide resolved

Liangyx2 changed the title ~~[VLLMZ-905] fix(xpu): Re-compute compile ranges after platform-specific config updates~~ fix(xpu): Re-compute compile ranges after platform-specific config updates Mar 19, 2026

remove redundant first call

b20d37d

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

ProExpertProg approved these changes Mar 19, 2026

View reviewed changes

yewentao256 approved these changes Mar 19, 2026

View reviewed changes

Merge branch 'vllm-project:main' into VLLMZ-905

538fd1e

jikunshang approved these changes Mar 20, 2026

View reviewed changes

jikunshang enabled auto-merge (squash) March 20, 2026 02:03

fix: remove trailing whitespace (pre-commit)

be2710d

Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com>

auto-merge was automatically disabled March 20, 2026 02:04
Head branch was pushed to by a user without write access

ProExpertProg enabled auto-merge (squash) March 20, 2026 02:05

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 20, 2026

ProExpertProg merged commit 638a872 into vllm-project:main Mar 20, 2026
51 checks passed

leo-pony mentioned this pull request Mar 23, 2026

Main2main Upgrade vllm commit to 0320 17:00 vllm-project/vllm-ascend#7510

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(xpu): Re-compute compile ranges after platform-specific config updates#37523

fix(xpu): Re-compute compile ranges after platform-specific config updates#37523
ProExpertProg merged 7 commits intovllm-project:mainfrom
Liangyx2:VLLMZ-905

Liangyx2 commented Mar 19, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

Liangyx2 commented Mar 19, 2026

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

mergify bot commented Mar 19, 2026

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

ProExpertProg left a comment

Uh oh!

Uh oh!

mergify bot commented Mar 19, 2026

Uh oh!

yewentao256 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Liangyx2 commented Mar 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Issue

Root Cause

Fix

Testing

Changes

Notes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

Liangyx2 commented Mar 19, 2026

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

mergify bot commented Mar 19, 2026

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

jikunshang commented Mar 19, 2026

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Mar 19, 2026

Uh oh!

yewentao256 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Liangyx2 commented Mar 19, 2026 •

edited by github-actions bot

Loading

yewentao256 left a comment •

edited

Loading