Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/grpo_trainer/run_qwen3moe-30b_megatron_96gb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megat
actor_rollout_ref.rollout.free_cache_engine=True \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=${infer_ppo_micro_batch_size_per_gpu} \
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu=${infer_ppo_max_token_len} \
actor_rollout_ref.ref.megatron.use_dist_checkpointing=True \
actor_rollout_ref.ref.megatron.use_dist_checkpointing=${USE_DIST_CKPT} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Based on the pull request description, use_dist_checkpointing must be False for the reference model because it lacks a distributed checkpoint path. Using the ${USE_DIST_CKPT} variable makes this setting configurable. If a user sets USE_DIST_CKPT=True (e.g., for the actor model), it would also be incorrectly enabled for the reference model, likely causing a runtime error. To ensure the script's robustness and prevent misconfiguration, this value should be hardcoded to False for the reference model.

Suggested change
actor_rollout_ref.ref.megatron.use_dist_checkpointing=${USE_DIST_CKPT} \
actor_rollout_ref.ref.megatron.use_dist_checkpointing=False \

actor_rollout_ref.ref.megatron.param_offload=${offload} \
actor_rollout_ref.ref.megatron.tensor_model_parallel_size=${REF_TP} \
actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=${REF_PP} \
Expand All @@ -192,4 +192,4 @@ python3 -m verl.trainer.main_ppo --config-path=./config --config-name='ppo_megat
trainer.save_freq=100 \
trainer.total_epochs=10 \
trainer.resume_mode=auto \
trainer.log_val_generations=10
trainer.log_val_generations=10