Skip to content

skip HPU graphs for long prefills#1033

Merged
czhu15 merged 2 commits into
vllm-project:aicefrom
yangulei:long_seq_graph
Feb 27, 2026
Merged

skip HPU graphs for long prefills#1033
czhu15 merged 2 commits into
vllm-project:aicefrom
yangulei:long_seq_graph

Conversation

@yangulei
Copy link
Copy Markdown
Collaborator

Re-apply #780 to avoid OOM error caused by too many unnecessary HPU graphs captured for long prefills.

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-applies the “skip HPU graphs for long prefills” behavior to reduce HPU OOM risk by avoiding HPU graph capture for large prompt (prefill) workloads, especially when long context is involved (e.g., chunked prefill / prefix caching scenarios).

Changes:

  • Introduces max_graph_capture_tokens, defaulting to max_num_batched_tokens when max_cudagraph_capture_size is unset.
  • Updates the HPU-graph enable/disable heuristic to account for context size when deciding whether to bypass HPU graphs for prefills.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated
Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py
Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Copy link
Copy Markdown
Collaborator

@taotod taotod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@czhu15 czhu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@czhu15 czhu15 merged commit 809daf8 into vllm-project:aice Feb 27, 2026
1 check passed
czhu15 pushed a commit that referenced this pull request Feb 27, 2026
Re-apply #780 to avoid
OOM error caused by too many unnecessary HPU graphs captured for long
prefills.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
tvoas pushed a commit to tvoas/vllm-gaudi that referenced this pull request Mar 11, 2026
Re-apply vllm-project#780 to avoid
OOM error caused by too many unnecessary HPU graphs captured for long
prefills.

---------

Signed-off-by: Youlei Yang <youlei.yang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants