[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors. by chaojun-zhang · Pull Request #39977 · vllm-project/vllm

chaojun-zhang · 2026-04-16T07:08:49Z

Purpose

Test Plan

Without PR:

(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] WorkerProc hit an exception.
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] Traceback (most recent call last):
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 966, in worker_busy_loop
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     output = func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 381, in determine_available_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5890, in profile_cudagraph_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self._init_minimal_kv_cache_for_profiling()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5834, in _init_minimal_kv_cache_for_profiling
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self.initialize_kv_cache(minimal_config, is_profiling=True)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6757, in initialize_kv_cache
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_caches = self.initialize_kv_cache_tensors(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6673, in initialize_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_cache_raw_tensors = self._allocate_kv_cache_tensors(kv_cache_config)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6477, in _allocate_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     tensor = torch.zeros(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] Traceback (most recent call last):
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 966, in worker_busy_loop
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     output = func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 381, in determine_available_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5890, in profile_cudagraph_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self._init_minimal_kv_cache_for_profiling()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5834, in _init_minimal_kv_cache_for_profiling
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self.initialize_kv_cache(minimal_config, is_profiling=True)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6757, in initialize_kv_cache
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_caches = self.initialize_kv_cache_tensors(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6673, in initialize_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_cache_raw_tensors = self._allocate_kv_cache_tensors(kv_cache_config)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6477, in _allocate_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     tensor = torch.zeros(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request updates the determine_available_memory method in gpu_worker.py to skip CUDA graph memory profiling on XPU platforms, consistent with the existing behavior for ROCm/HIP. This change prevents potential incorrect or negative memory estimates on XPU. I have no feedback to provide.

jikunshang · 2026-04-17T05:15:02Z

see #39466 (comment)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

jikunshang · 2026-04-18T07:18:04Z

cc @ProExpertProg PTAL, thanks!

…startup errors. (vllm-project#39977) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

…startup errors. (vllm-project#39977) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…startup errors. (vllm-project#39977) Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Adrian <info@zzit.ch>

chaojun-zhang requested a review from njhill as a code owner April 16, 2026 07:08

mergify Bot added nvidia intel-gpu Related to Intel GPU v1 labels Apr 16, 2026

github-project-automation Bot added this to NVIDIA Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

chaojun-zhang changed the title ~~[XPU] Skip cudagraph memory estimate on xpu to avoid start error~~ [XPU] Skipping CUDA graph memory estimation to avoid startup errors. Apr 16, 2026

jikunshang mentioned this pull request Apr 17, 2026

[XPU] Enable torch.compile for XPU GDN attention #39466

Merged

5 tasks

chaojun-zhang closed this Apr 17, 2026

github-project-automation Bot moved this to Done in NVIDIA Apr 17, 2026

chaojun-zhang reopened this Apr 17, 2026

chaojun-zhang changed the title ~~[XPU] Skipping CUDA graph memory estimation to avoid startup errors.~~ [XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors. Apr 17, 2026

xinyu-intel approved these changes Apr 17, 2026

View reviewed changes

[XPU] skip cudagraph memory estimate on xpu

fc95df8

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

chaojun-zhang force-pushed the fix_start_error branch from 0a85d08 to fc95df8 Compare April 18, 2026 06:23

jikunshang approved these changes Apr 18, 2026

View reviewed changes

github-project-automation Bot moved this from Done to Ready in NVIDIA Apr 18, 2026

Merge branch 'main' into fix_start_error

03410e0

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 18, 2026

xinyu-intel approved these changes Apr 18, 2026

View reviewed changes

jikunshang merged commit 4f4713f into vllm-project:main Apr 20, 2026
61 of 63 checks passed

github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors.#39977

[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors.#39977
jikunshang merged 2 commits intovllm-project:mainfrom
chaojun-zhang:fix_start_error

chaojun-zhang commented Apr 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

jikunshang commented Apr 17, 2026

Uh oh!

jikunshang commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

chaojun-zhang commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jikunshang commented Apr 17, 2026

Uh oh!

jikunshang commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chaojun-zhang commented Apr 16, 2026 •

edited

Loading