diff --git a/docs/model-quirks.md b/docs/model-quirks.md index 6ba7f12f55..7a79a95e66 100644 --- a/docs/model-quirks.md +++ b/docs/model-quirks.md @@ -39,6 +39,13 @@ Whether model level support CP only depends on arguments passed to `torch.nn.fun - It's a known issue that context parallel can't be used together with sequence parallel. Refer to [here](https://github.com/NVIDIA-NeMo/RL/issues/659) for more details. +## DeepScaleR Recipe Convergence Issues + +The DeepScaleR recipe (e.g., `examples/configs/grpo-deepscaler-1.5b-8K.yaml`) has been found to experience convergence issues when CUDA graphs are enabled in vLLM. + +**Special Handling:** +- CUDA graphs must be disabled by setting `enforce_eager: True` in the vLLM configuration (https://github.com/NVIDIA-NeMo/RL/pull/857 forces eager execution by default). + ## vLLM Async Rollout Timeout vLLM async generation has a configurable timeout for waiting for individual sample results. This is particularly important for longer sequences on large models. diff --git a/examples/configs/grpo-deepscaler-1.5b-24K.yaml b/examples/configs/grpo-deepscaler-1.5b-24K.yaml index b8ab06496f..52d1ed2018 100644 --- a/examples/configs/grpo-deepscaler-1.5b-24K.yaml +++ b/examples/configs/grpo-deepscaler-1.5b-24K.yaml @@ -42,6 +42,7 @@ policy: tensor_parallel_size: 1 pipeline_parallel_size: 1 gpu_memory_utilization: 0.8 + enforce_eager: True max_model_len: ${policy.max_total_sequence_length} # For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit # For Gemma models, we need to use "auto" due to a vllm bug diff --git a/examples/configs/grpo-deepscaler-1.5b-8K.yaml b/examples/configs/grpo-deepscaler-1.5b-8K.yaml index a74c0ed38c..69976b5cb5 100644 --- a/examples/configs/grpo-deepscaler-1.5b-8K.yaml +++ b/examples/configs/grpo-deepscaler-1.5b-8K.yaml @@ -102,7 +102,7 @@ policy: pipeline_parallel_size: 1 gpu_memory_utilization: 0.6 max_model_len: ${policy.max_total_sequence_length} - enforce_eager: False + enforce_eager: True # For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit # For Gemma models, we need to use "auto" due to a vllm bug load_format: dummy