[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP#26574
[Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP#26574youkaichao merged 4 commits intovllm-project:mainfrom
Conversation
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request addresses an issue where enabling Decode Context Parallelism (DCP) was incompatible with full CUDA graph modes. The proposed fix forces the cudagraph_mode to PIECEWISE when DCP is active. While the intent is correct, the implementation is overly aggressive and will override a user's explicit choice to disable CUDA graphs entirely (cudagraph_mode=NONE). My review includes a critical comment to refine this logic, ensuring it only downgrades from FULL modes to PIECEWISE and warns the user, without affecting NONE mode.
| if self.parallel_config.decode_context_parallel_size > 1: | ||
| self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISE |
There was a problem hiding this comment.
This implementation unconditionally sets cudagraph_mode to PIECEWISE if decode context parallelism (DCP) is enabled. This is too aggressive as it will override a user's explicit choice to disable CUDA graphs (e.g., cudagraph_mode=NONE), which might be done for debugging purposes.
A better approach is to only downgrade the mode to PIECEWISE if a FULL CUDA graph mode was requested, as those are the ones incompatible with DCP. This change also adds a warning to inform the user about the automatic adjustment.
if self.parallel_config.decode_context_parallel_size > 1 and \
self.compilation_config.cudagraph_mode.has_full_cudagraphs():
logger.warning(
"Decode context parallel (DCP) is enabled, which is "
"incompatible with full CUDA graphs. Downgrading "
"cudagraph_mode from %s to PIECEWISE.",
self.compilation_config.cudagraph_mode.name)
self.compilation_config.cudagraph_mode = CUDAGraphMode.PIECEWISEThere was a problem hiding this comment.
These code snippets will only execute when cudagraph_mode is not explicitly set by users.
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
youkaichao
left a comment
There was a problem hiding this comment.
cc @LucasWilkinson @youzhedian it should be possible to make dcp compatible with full cudagraph.
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
…oject#26574) Signed-off-by: FENP <32334296+FENP@users.noreply.github.com>
Purpose
#25444 change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE. However, DCP do not support full cuda graphs now (#26022 (comment)). This PR change default CUDAGraphMode to PIECEWISE when enable DCP.
cc @youzhedian @youkaichao @LucasWilkinson
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.