Skip to content

fix: Add PIECEWISE cudagraph mode config for prefill server to avoid startup errors#29079

Open
xbfs wants to merge 4 commits intovllm-project:mainfrom
xbfs:fix/prefill-cudagraph-config
Open

fix: Add PIECEWISE cudagraph mode config for prefill server to avoid startup errors#29079
xbfs wants to merge 4 commits intovllm-project:mainfrom
xbfs:fix/prefill-cudagraph-config

Conversation

@xbfs
Copy link
Contributor

@xbfs xbfs commented Nov 20, 2025

The default cudagraph configuration (FULL_AND_PIECEWISE) causes prefill instance startup errors. This change explicitly sets the cudagraph_mode to PIECEWISE for prefill servers in the disaggregated serving script.

@mergify
Copy link

mergify bot commented Nov 20, 2025

Documentation preview: https://vllm--29079.org.readthedocs.build/en/29079/

@mergify mergify bot added documentation Improvements or additions to documentation nvidia kv-connector labels Nov 20, 2025
… errors

The default cudagraph configuration (FULL_AND_PIECEWISE) causes prefill instance startup errors. This change explicitly sets the cudagraph_mode to PIECEWISE for prefill servers in the disaggregated serving script.

Signed-off-by: Bofeng BF1 Xue <xuebf1@Lenovo.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a startup error in the disaggregated serving example script. The error occurs in prefill server instances due to the default cudagraph_mode of FULL_AND_PIECEWISE. The proposed change correctly resolves this issue by explicitly setting the cudagraph_mode to PIECEWISE for the prefill server, using the --compilation-config argument. This is a targeted and appropriate fix, as the PIECEWISE mode is better suited for the dynamic nature of prefill operations, thus avoiding the startup failures. The implementation is correct, and I find no issues with this change.

@xbfs xbfs force-pushed the fix/prefill-cudagraph-config branch from 4b9cb82 to 97a4457 Compare November 20, 2025 08:51
@LucasWilkinson
Copy link
Collaborator

Can you please document the errors you are seeing?

@xbfs
Copy link
Contributor Author

xbfs commented Nov 25, 2025

When the prefill instance runs to 'Capturing CUDA graphs (decode, FULL)', an error occurs:
(EngineCore_DP0 pid=1940) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 624, in unified_attention_with_output
(EngineCore_DP0 pid=1940) maybe_save_kv_layer_to_connector(layer_name, kv_cache)
(EngineCore_DP0 pid=1940) File "/usr/local/lib/python3.12/dist-packages/vllm/attention/layer.py", line 556, in maybe_save_kv_layer_to_connector
(EngineCore_DP0 pid=1940) connector.save_kv_layer(layer_name, kv_cache_layer,
(EngineCore_DP0 pid=1940) File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/p2p/p2p_nccl_connector.py", line 267, in save_kv_layer
(EngineCore_DP0 pid=1940) connector_metadata = self._get_connector_metadata()
(EngineCore_DP0 pid=1940) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1940) File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/base.py", line 132, in _get_connector_metadata
(EngineCore_DP0 pid=1940) assert self._connector_metadata is not None
(EngineCore_DP0 pid=1940) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1940) AssertionError

@LucasWilkinson
Copy link
Collaborator

cc @NickLucche @njhill

(potentially related: #27026)

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation kv-connector nvidia stale Over 90 days of inactivity

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants