Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/model-quirks.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ Whether model level support CP only depends on arguments passed to `torch.nn.fun
- It's a known issue that context parallel can't be used together with sequence parallel.
Refer to [here](https://github.com/NVIDIA-NeMo/RL/issues/659) for more details.

## DeepScaleR Recipe Convergence Issues

The DeepScaleR recipe (e.g., `examples/configs/grpo-deepscaler-1.5b-8K.yaml`) has been found to experience convergence issues when CUDA graphs are enabled in vLLM.

**Special Handling:**
- CUDA graphs must be disabled by setting `enforce_eager: True` in the vLLM configuration (https://github.com/NVIDIA-NeMo/RL/pull/857 forces eager execution by default).

## vLLM Async Rollout Timeout

vLLM async generation has a configurable timeout for waiting for individual sample results. This is particularly important for longer sequences on large models.
Expand Down
1 change: 1 addition & 0 deletions examples/configs/grpo-deepscaler-1.5b-24K.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ policy:
tensor_parallel_size: 1
pipeline_parallel_size: 1
gpu_memory_utilization: 0.8
enforce_eager: True
max_model_len: ${policy.max_total_sequence_length}
# For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
# For Gemma models, we need to use "auto" due to a vllm bug
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/grpo-deepscaler-1.5b-8K.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ policy:
pipeline_parallel_size: 1
gpu_memory_utilization: 0.6
max_model_len: ${policy.max_total_sequence_length}
enforce_eager: False
enforce_eager: True
# For most cases, use "dummy" to load the initial weights, since they will be overwritten during refit
# For Gemma models, we need to use "auto" due to a vllm bug
load_format: dummy
Expand Down