Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ repos:
base="examples/configs/dpo.yaml"; for f in examples/configs/recipes/llm/dpo-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
base="examples/configs/grpo_math_1B.yaml"; for f in examples/configs/recipes/llm/grpo-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
base="examples/configs/sft.yaml"; for f in examples/configs/recipes/llm/sft-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
base="examples/configs/distillation_math.yaml"; for f in examples/configs/recipes/llm/distillation-*.yaml; do [ -e "$f" ] && ./tools/config_cli.py minimize-check "$base" "$f"; done
- id: configs-minimize-check-vlm
name: minimize-check vlm recipes
language: system
Expand Down
2 changes: 1 addition & 1 deletion docs/design-docs/generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The core of the generation system is defined in `interfaces.py`, which establish
max_new_tokens: int # Maximum number of tokens to generate
temperature: float # Sampling temperature
top_p: float # Top-p sampling parameter
top_k: int # Top-k sampling parameter
top_k: int | None # Top-k sampling parameter
model_name: str # Name or path of the model
```

Expand Down
2 changes: 1 addition & 1 deletion examples/configs/distillation_math.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ policy: &POLICY_BASE
#gives ~20% training perf speedup with sequence packing
apply_rope_fusion: True
bias_activation_fusion: True
defer_fp32_logits: null
defer_fp32_logits: False

optimizer:
optimizer: "adam"
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/distillation_math_megatron.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ policy: &POLICY_BASE
#gives ~20% training perf speedup with sequence packing
apply_rope_fusion: True
bias_activation_fusion: True
defer_fp32_logits: null
defer_fp32_logits: False

optimizer:
optimizer: "adam"
Expand Down
2 changes: 2 additions & 0 deletions examples/configs/dpo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ policy:
moe_permute_fusion: false
#gives ~20% training perf speedup with sequence packing
apply_rope_fusion: True
defer_fp32_logits: False

optimizer:
optimizer: "adam"
Expand Down Expand Up @@ -155,6 +156,7 @@ policy:
overlap_param_gather: true
average_in_collective: true
data_parallel_sharding_strategy: "optim_grads_params"
use_custom_fsdp: false

data:
max_input_seq_length: ${policy.max_total_sequence_length}
Expand Down
4 changes: 2 additions & 2 deletions examples/configs/grpo_math_1B.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ policy:
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
chat_template_kwargs: null # can be used to pass kwargs to the chat template, e.g., enable_thinking=true
hf_config_overrides: null
hf_config_overrides: {}
train_global_batch_size: 512
train_micro_batch_size: 4
generation_batch_size: 32 # Only used when generating using HF backend
Expand Down Expand Up @@ -103,7 +103,7 @@ policy:
moe_permute_fusion: false
#gives ~20% training perf speedup with sequence packing
apply_rope_fusion: True
defer_fp32_logits: null
defer_fp32_logits: False

optimizer:
optimizer: "adam"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ loss_fn:
kl_type: reverse
checkpointing:
checkpoint_dir: checkpoints/distillation-qwen3-32b-to-4b-base-long
save_period: 10
policy:
model_name: Qwen/Qwen3-4B-Base
max_total_sequence_length: 20480
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,6 @@ policy:
NVTE_FP8_BLOCK_SCALING_FP32_SCALES: '1'
generation:
max_new_tokens: 4096
stop_token_ids:
- 128009
vllm_cfg:
precision: fp8
gpu_memory_utilization: 0.5
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,6 @@ policy:
lr_warmup_init: 5.0e-08
generation:
max_new_tokens: 4096
stop_token_ids:
- 128009
vllm_cfg:
precision: fp8
max_model_len: 4096
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ policy:
- 13
generation:
max_new_tokens: 4096
stop_token_ids:
- 128009
vllm_cfg:
async_engine: true
max_model_len: 4096
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ policy:
- 13
generation:
max_new_tokens: 4096
stop_token_ids:
- 128009
vllm_cfg:
max_model_len: 4096
data:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ policy:
make_sequence_length_divisible_by: 1
generation:
max_new_tokens: 512
stop_token_ids:
- 128009
vllm_cfg:
max_model_len: 512
data:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,6 @@ policy:
make_sequence_length_divisible_by: 1
generation:
max_new_tokens: 512
stop_token_ids:
- 128009
vllm_cfg:
max_model_len: 512
data:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ policy:
- 13
generation:
max_new_tokens: 16384
stop_token_ids:
- 151643
vllm_cfg:
tensor_parallel_size: 4
max_model_len: 16384
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ policy:
- 13
generation:
max_new_tokens: 16384
stop_token_ids:
- 151643
vllm_cfg:
tensor_parallel_size: 4
max_model_len: 16384
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ policy:
- 13
generation:
max_new_tokens: 4096
stop_token_ids:
- 151645
vllm_cfg:
tensor_parallel_size: 4
max_model_len: 4096
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,6 @@ policy:
- 13
generation:
max_new_tokens: 4096
stop_token_ids:
- 151645
vllm_cfg:
tensor_parallel_size: 4
max_model_len: 4096
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,6 @@ policy:
make_sequence_length_divisible_by: 1
generation:
max_new_tokens: 512
stop_token_ids:
- 151645
vllm_cfg:
max_model_len: 512
data:
Expand Down
5 changes: 4 additions & 1 deletion examples/configs/sft.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ policy:

dtensor_cfg:
enabled: true
env_vars: {}
cpu_offload: False
sequence_parallel: false
activation_checkpointing: false
Expand Down Expand Up @@ -73,6 +74,7 @@ policy:
## ignored since enabled=false, but needed for testing purposes
megatron_cfg:
enabled: false
env_vars: {}
empty_unused_memory_level: 1
activation_checkpointing: false
tensor_model_parallel_size: 1
Expand All @@ -90,7 +92,8 @@ policy:
moe_router_bias_update_rate: 1e-3
moe_permute_fusion: false
#gives ~20% training perf speedup with sequence packing
apply_rope_fusion: True
apply_rope_fusion: True
defer_fp32_logits: False

optimizer:
optimizer: "adam"
Expand Down
1 change: 1 addition & 0 deletions examples/configs/sft_openmathinstruct2_megatron.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ policy:
grad_reduce_in_fp32: true
overlap_grad_reduce: true
overlap_param_gather: true
use_custom_fsdp: false
empty_unused_memory_level: 1
enabled: true
expert_tensor_parallel_size: 1
Expand Down
6 changes: 5 additions & 1 deletion examples/configs/vlm_grpo_3B.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ policy:
moe_permute_fusion: false
#gives ~20% training perf speedup with sequence packing
apply_rope_fusion: True
defer_fp32_logits: null
defer_fp32_logits: False

optimizer:
optimizer: "adam"
Expand All @@ -116,6 +116,10 @@ policy:
use_distributed_optimizer: true
use_precision_aware_optimizer: true

# optimizer cpu offload
optimizer_cpu_offload: false
optimizer_offload_fraction: 0.0

clip_grad: ${policy.max_grad_norm}

scheduler:
Expand Down
4 changes: 3 additions & 1 deletion examples/configs/vlm_grpo_3B_megatron.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@ policy:
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
algorithm: modified_first_fit_decreasing
sequence_length_round: 64
optimizer: null
scheduler:
- name: torch.optim.lr_scheduler.LinearLR
kwargs:
Expand Down Expand Up @@ -133,6 +132,7 @@ policy:
moe_router_bias_update_rate: 0.0
moe_permute_fusion: false
apply_rope_fusion: true
defer_fp32_logits: False
optimizer:
optimizer: adam
lr: 2.0e-07
Expand All @@ -147,6 +147,8 @@ policy:
sgd_momentum: 0.9
use_distributed_optimizer: true
use_precision_aware_optimizer: true
optimizer_cpu_offload: false
optimizer_offload_fraction: 0.0
clip_grad: ${policy.max_grad_norm}
scheduler:
start_weight_decay: ${policy.megatron_cfg.optimizer.weight_decay}
Expand Down
3 changes: 2 additions & 1 deletion nemo_rl/algorithms/loss_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ class ClippedPGLossConfig(TypedDict):
reference_policy_kl_penalty: float
ratio_clip_min: float
ratio_clip_max: float
ratio_clip_c: float
# Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). None to disable.
ratio_clip_c: float | None
Comment on lines +42 to +43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use NotRequired[float] instead of float | None for optional configuration.

The coding guidelines specify: "Express configuration optionality via TypedDict using typing.NotRequired". The current implementation uses float | None, which means the key must be present in the config (but can be None), whereas NotRequired[float] indicates the key may be absent entirely.

As per coding guidelines.

Apply this diff to align with the coding guidelines:

-    # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). None to disable.
-    ratio_clip_c: float | None
+    # Dual-clipping value (should be >1 if enabled; usually set to 3 empirically). Omit or set to None to disable.
+    ratio_clip_c: NotRequired[float | None]

Additionally, update line 111 to handle the key's potential absence:

self.ratio_clip_c = cfg.get("ratio_clip_c", None)  # set to None to disable dual-clipping
🤖 Prompt for AI Agents
In nemo_rl/algorithms/loss_functions.py around lines 42-43, change the TypedDict
declaration for ratio_clip_c from a required field typed as "float | None" to an
optional field using "NotRequired[float]" (add/import NotRequired from typing)
so the config key may be omitted; then at line 111 update the assignment to pull
the value with cfg.get("ratio_clip_c", None) so missing keys default to None
(disable dual-clipping).

use_on_policy_kl_approximation: bool
use_importance_sampling_correction: bool
truncated_importance_sampling_ratio: float | None
Expand Down
Loading
Loading