Skip to content

[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors.#39977

Merged
jikunshang merged 2 commits intovllm-project:mainfrom
chaojun-zhang:fix_start_error
Apr 20, 2026
Merged

[XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors.#39977
jikunshang merged 2 commits intovllm-project:mainfrom
chaojun-zhang:fix_start_error

Conversation

@chaojun-zhang
Copy link
Copy Markdown
Contributor

@chaojun-zhang chaojun-zhang commented Apr 16, 2026

Purpose

Test Plan

Without PR:

(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] WorkerProc hit an exception.
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] Traceback (most recent call last):
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 966, in worker_busy_loop
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     output = func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 381, in determine_available_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5890, in profile_cudagraph_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self._init_minimal_kv_cache_for_profiling()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5834, in _init_minimal_kv_cache_for_profiling
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self.initialize_kv_cache(minimal_config, is_profiling=True)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6757, in initialize_kv_cache
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_caches = self.initialize_kv_cache_tensors(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6673, in initialize_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_cache_raw_tensors = self._allocate_kv_cache_tensors(kv_cache_config)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6477, in _allocate_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     tensor = torch.zeros(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] Traceback (most recent call last):
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/executor/multiproc_executor.py", line 966, in worker_busy_loop
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     output = func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_worker.py", line 381, in determine_available_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     cudagraph_memory_estimate = self.model_runner.profile_cudagraph_memory()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     return func(*args, **kwargs)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5890, in profile_cudagraph_memory
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self._init_minimal_kv_cache_for_profiling()
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 5834, in _init_minimal_kv_cache_for_profiling
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     self.initialize_kv_cache(minimal_config, is_profiling=True)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6757, in initialize_kv_cache
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_caches = self.initialize_kv_cache_tensors(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6673, in initialize_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     kv_cache_raw_tensors = self._allocate_kv_cache_tensors(kv_cache_config)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]   File "/workspace/vllm/vllm/v1/worker/gpu_model_runner.py", line 6477, in _allocate_kv_cache_tensors
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]     tensor = torch.zeros(
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971]              ^^^^^^^^^^^^
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
(Worker_TP0 pid=43058) ERROR 04-16 07:04:03 [multiproc_executor.py:971] 

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@chaojun-zhang chaojun-zhang requested a review from njhill as a code owner April 16, 2026 07:08
@mergify mergify Bot added nvidia intel-gpu Related to Intel GPU v1 labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the determine_available_memory method in gpu_worker.py to skip CUDA graph memory profiling on XPU platforms, consistent with the existing behavior for ROCm/HIP. This change prevents potential incorrect or negative memory estimates on XPU. I have no feedback to provide.

@chaojun-zhang chaojun-zhang changed the title [XPU] Skip cudagraph memory estimate on xpu to avoid start error [XPU] Skipping CUDA graph memory estimation to avoid startup errors. Apr 16, 2026
@jikunshang
Copy link
Copy Markdown
Collaborator

see #39466 (comment)

@github-project-automation github-project-automation Bot moved this to Done in NVIDIA Apr 17, 2026
@chaojun-zhang chaojun-zhang reopened this Apr 17, 2026
@chaojun-zhang chaojun-zhang changed the title [XPU] Skipping CUDA graph memory estimation to avoid startup errors. [XPU] [torch.compile] Skipping CUDA graph memory estimation to avoid startup errors. Apr 17, 2026
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
@github-project-automation github-project-automation Bot moved this from Done to Ready in NVIDIA Apr 18, 2026
@jikunshang
Copy link
Copy Markdown
Collaborator

cc @ProExpertProg PTAL, thanks!

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 18, 2026
@jikunshang jikunshang merged commit 4f4713f into vllm-project:main Apr 20, 2026
61 of 63 checks passed
@github-project-automation github-project-automation Bot moved this from Ready to Done in NVIDIA Apr 20, 2026
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026
…startup errors. (vllm-project#39977)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
…startup errors. (vllm-project#39977)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…startup errors. (vllm-project#39977)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
…startup errors. (vllm-project#39977)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Adrian <info@zzit.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

intel-gpu Related to Intel GPU nvidia ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants