skip HPU graphs for long prefills by yangulei · Pull Request #1033 · vllm-project/vllm-gaudi

yangulei · 2026-02-25T06:02:30Z

Re-apply #780 to avoid OOM error caused by too many unnecessary HPU graphs captured for long prefills.

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

Copilot

Pull request overview

This PR re-applies the “skip HPU graphs for long prefills” behavior to reduce HPU OOM risk by avoiding HPU graph capture for large prompt (prefill) workloads, especially when long context is involved (e.g., chunked prefill / prefix caching scenarios).

Changes:

Introduces max_graph_capture_tokens, defaulting to max_num_batched_tokens when max_cudagraph_capture_size is unset.
Updates the HPU-graph enable/disable heuristic to account for context size when deciding whether to bypass HPU graphs for prefills.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

taotod

LGTM

czhu15

LGTM

Re-apply #780 to avoid OOM error caused by too many unnecessary HPU graphs captured for long prefills. --------- Signed-off-by: Youlei Yang <youlei.yang@intel.com>

Re-apply vllm-project#780 to avoid OOM error caused by too many unnecessary HPU graphs captured for long prefills. --------- Signed-off-by: Youlei Yang <youlei.yang@intel.com>

skip HPU graphs for long prefills

e501ef5

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

Copilot AI review requested due to automatic review settings February 25, 2026 06:02

yangulei requested review from Wei-Lin-Intel, czhu15, mgawarkiewicz-intel, piotrbocian, taotod and wpyszka as code owners February 25, 2026 06:02

Copilot started reviewing on behalf of yangulei February 25, 2026 06:02 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

Comment thread vllm_gaudi/v1/worker/hpu_model_runner.py

use attn_metadata.is_prompt instead of seq_len > 1

9c6599e

Signed-off-by: Youlei Yang <youlei.yang@intel.com>

github-actions Bot mentioned this pull request Feb 25, 2026

🚦 Team Review Dashboard #701

Open

taotod approved these changes Feb 25, 2026

View reviewed changes

czhu15 approved these changes Feb 27, 2026

View reviewed changes

czhu15 merged commit 809daf8 into vllm-project:aice Feb 27, 2026
1 check passed

czhu15 pushed a commit that referenced this pull request Feb 27, 2026

skip HPU graphs for long prefills (#1033)

36bf5e5

Re-apply #780 to avoid OOM error caused by too many unnecessary HPU graphs captured for long prefills. --------- Signed-off-by: Youlei Yang <youlei.yang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip HPU graphs for long prefills#1033

skip HPU graphs for long prefills#1033
czhu15 merged 2 commits into
vllm-project:aicefrom
yangulei:long_seq_graph

yangulei commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taotod left a comment

Uh oh!

czhu15 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yangulei commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taotod left a comment

Choose a reason for hiding this comment

Uh oh!

czhu15 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants