NVIDIA-NeMo · terrykong · Aug 7, 2025 · Aug 6, 2025 · Aug 6, 2025
@@ -39,6 +39,13 @@ Whether model level support CP only depends on arguments passed to `torch.nn.fun
 - It's a known issue that context parallel can't be used together with sequence parallel.
 Refer to [here](https://github.com/NVIDIA-NeMo/RL/issues/659) for more details.
 
+## DeepScaleR Recipe Convergence Issues
+
+The DeepScaleR recipe (e.g., `examples/configs/grpo-deepscaler-1.5b-8K.yaml`) has been found to experience convergence issues when CUDA graphs are enabled in vLLM.
+
+**Special Handling:**
+- CUDA graphs must be disabled by setting `enforce_eager: True` in the vLLM configuration (https://github.com/NVIDIA-NeMo/RL/pull/857 forces eager execution by default).
+
 ## vLLM Async Rollout Timeout
 
 vLLM async generation has a configurable timeout for waiting for individual sample results. This is particularly important for longer sequences on large models.

@@ -42,6 +42,7 @@ policy:
       tensor_parallel_size: 1
       pipeline_parallel_size: 1
       gpu_memory_utilization: 0.8
+      enforce_eager: True
       max_model_len: ${policy.max_total_sequence_length}
       # For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
       # For Gemma models, we need to use "auto" due to a vllm bug

@@ -102,7 +102,7 @@ policy:
       pipeline_parallel_size: 1
       gpu_memory_utilization: 0.6
       max_model_len: ${policy.max_total_sequence_length}
-      enforce_eager: False
+      enforce_eager: True
       # For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
       # For Gemma models, we need to use "auto" due to a vllm bug
       load_format: dummy