diff --git a/docs/guides/grpo-deepscaler.md b/docs/guides/grpo-deepscaler.md index 5beddf1689..203934324f 100644 --- a/docs/guides/grpo-deepscaler.md +++ b/docs/guides/grpo-deepscaler.md @@ -33,7 +33,9 @@ Throughout training, the checkpoints of the model will be saved to the `results` ```sh uv run examples/run_eval.py \ - generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf + generation.model_name=results/grpo-deepscaler-1.5b-8K/step_240/hf \ + data.prompt_file=examples/prompts/cot.txt \ + generation.vllm_cfg.max_model_len=32768 ``` Use `generation.model_name` to specify the path to the Hugging Face checkpoint. In addition, we use AIME24 as the validation dataset and calculate pass@1 on it throughout training.