Fix num_train_epochs=None causing TypeError in GRPOConfig#3972
Conversation
Summary of ChangesHello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical bug where the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6f335f3f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if "num_train_epochs" in call_args: | ||
| num_train_epochs_check = ( | ||
| "if num_train_epochs is None:\n" | ||
| " num_train_epochs = 1 # Default to 1 epoch if None, max_steps will override\n" |
There was a problem hiding this comment.
Gate None→1 epoch fallback on max_steps
This always forces num_train_epochs=None to 1, even when max_steps is not set. In that case training will silently run for a single epoch instead of the default TrainingArguments value (or the previous TypeError), so users who pass None expecting the default or an explicit failure will now get a shortened run. The commit message says “max_steps will override,” but this code doesn’t check that condition. Consider only applying the fallback when max_steps is provided (or using the Trainer default) to avoid unexpected 1‑epoch training.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request addresses a TypeError when num_train_epochs=None is used with GRPOConfig. The fix correctly defaults num_train_epochs to 1 in this case, preventing the error while allowing max_steps to control the training duration as intended. The implementation is consistent with the existing code-patching style in the file. The change is well-scoped and effectively resolves the issue.
When users pass `num_train_epochs=None` to GRPOConfig (relying on
max_steps to control training duration), Trainer.__init__ fails with:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
This happens because transformers.Trainer does `args.num_train_epochs > 0`
in its __init__ which fails when the value is None.
This fix converts None to 3.0 (the default) before Trainer initialization.
The actual training duration is still controlled by max_steps since it
takes precedence when both are set.
Example that now works:
```python
config = GRPOConfig(
num_train_epochs=None, # Previously caused TypeError
max_steps=500, # This controls actual duration
...
)
```
e6f335f to
2143f43
Compare
…3972) When users pass `num_train_epochs=None` to GRPOConfig (relying on max_steps to control training duration), Trainer.__init__ fails with: TypeError: '>' not supported between instances of 'NoneType' and 'int' This happens because transformers.Trainer does `args.num_train_epochs > 0` in its __init__ which fails when the value is None. This fix converts None to 3.0 (the default) before Trainer initialization. The actual training duration is still controlled by max_steps since it takes precedence when both are set. Example that now works: ```python config = GRPOConfig( num_train_epochs=None, # Previously caused TypeError max_steps=500, # This controls actual duration ... ) ```
Summary
num_train_epochs=Noneis passed to GRPOConfigProblem
When users pass
num_train_epochs=Noneto GRPOConfig, the Trainer initialization fails with:This happens at
transformers/training_args.py:290where Trainer does:Solution
Add a check in the generated RLConfig code to convert None to 3.0:
The actual training duration is still controlled by
max_stepssince it takes precedence when both are set.Test
Tested with TRL 0.27.1 and Unsloth main.