Skip to content

Fix num_train_epochs=None causing TypeError in GRPOConfig#3972

Merged
danielhanchen merged 1 commit into
mainfrom
fix-num-train-epochs-none
Feb 3, 2026
Merged

Fix num_train_epochs=None causing TypeError in GRPOConfig#3972
danielhanchen merged 1 commit into
mainfrom
fix-num-train-epochs-none

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

@danielhanchen danielhanchen commented Feb 3, 2026

Summary

  • Fix TypeError when num_train_epochs=None is passed to GRPOConfig
  • Converts None to 3.0 (the default) before Trainer init (max_steps still controls actual duration)

Problem

When users pass num_train_epochs=None to GRPOConfig, the Trainer initialization fails with:

TypeError: '>' not supported between instances of 'NoneType' and 'int'

This happens at transformers/training_args.py:290 where Trainer does:

if args.num_train_epochs > 0:  # Fails when None

Solution

Add a check in the generated RLConfig code to convert None to 3.0:

if num_train_epochs is None:
    num_train_epochs = 3.0  # Default to 3 epochs if None, max_steps will override

The actual training duration is still controlled by max_steps since it takes precedence when both are set.

Test

from trl import GRPOConfig, GRPOTrainer

config = GRPOConfig(
    num_train_epochs=None,  # Previously caused TypeError
    max_steps=500,          # This controls actual duration
    ...
)
# Now works - num_train_epochs converted to 3.0, training runs for 500 steps

Tested with TRL 0.27.1 and Unsloth main.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug where the Trainer initialization would fail with a TypeError if num_train_epochs was explicitly set to None within the GRPOConfig. The solution involves a pre-emptive conversion of num_train_epochs from None to 1 within the generated RLConfig code, ensuring that the Trainer's internal checks pass without error. This change maintains the intended behavior where max_steps dictates the training length when both parameters are provided.

Highlights

  • Bug Fix: Resolves a TypeError that occurred when num_train_epochs=None was passed to GRPOConfig during Trainer initialization.
  • Type Handling: Introduces a check to convert num_train_epochs from None to 1 before the Trainer is initialized, preventing the type comparison error.
  • Training Duration: Confirms that max_steps will still take precedence and control the actual training duration when both num_train_epochs and max_steps are set.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • unsloth/models/rl.py
    • Added a conditional check for num_train_epochs within the _patch_trl_rl_trainers function.
    • If num_train_epochs is None, it is now explicitly set to 1 to avoid a TypeError during Trainer instantiation.
Activity
  • The pull request was authored by danielhanchen.
  • The author provided a clear summary of the problem, detailing the TypeError encountered when num_train_epochs=None is used.
  • The proposed solution, involving a None to 1 conversion for num_train_epochs, is well-explained.
  • A test case demonstrating the fix with GRPOConfig and GRPOTrainer is included, confirming that the issue is resolved and max_steps still controls duration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6f335f3f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread unsloth/models/rl.py Outdated
Comment on lines +896 to +899
if "num_train_epochs" in call_args:
num_train_epochs_check = (
"if num_train_epochs is None:\n"
" num_train_epochs = 1 # Default to 1 epoch if None, max_steps will override\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate None→1 epoch fallback on max_steps

This always forces num_train_epochs=None to 1, even when max_steps is not set. In that case training will silently run for a single epoch instead of the default TrainingArguments value (or the previous TypeError), so users who pass None expecting the default or an explicit failure will now get a shortened run. The commit message says “max_steps will override,” but this code doesn’t check that condition. Consider only applying the fallback when max_steps is provided (or using the Trainer default) to avoid unexpected 1‑epoch training.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a TypeError when num_train_epochs=None is used with GRPOConfig. The fix correctly defaults num_train_epochs to 1 in this case, preventing the error while allowing max_steps to control the training duration as intended. The implementation is consistent with the existing code-patching style in the file. The change is well-scoped and effectively resolves the issue.

When users pass `num_train_epochs=None` to GRPOConfig (relying on
max_steps to control training duration), Trainer.__init__ fails with:

  TypeError: '>' not supported between instances of 'NoneType' and 'int'

This happens because transformers.Trainer does `args.num_train_epochs > 0`
in its __init__ which fails when the value is None.

This fix converts None to 3.0 (the default) before Trainer initialization.
The actual training duration is still controlled by max_steps since it
takes precedence when both are set.

Example that now works:
```python
config = GRPOConfig(
    num_train_epochs=None,  # Previously caused TypeError
    max_steps=500,          # This controls actual duration
    ...
)
```
@danielhanchen danielhanchen force-pushed the fix-num-train-epochs-none branch from e6f335f to 2143f43 Compare February 3, 2026 10:45
@danielhanchen danielhanchen merged commit 02afd40 into main Feb 3, 2026
4 checks passed
@danielhanchen danielhanchen deleted the fix-num-train-epochs-none branch February 3, 2026 10:48
abiswas-realadvice pushed a commit to abiswas-realadvice/unsloth that referenced this pull request May 14, 2026
…3972)

When users pass `num_train_epochs=None` to GRPOConfig (relying on
max_steps to control training duration), Trainer.__init__ fails with:

  TypeError: '>' not supported between instances of 'NoneType' and 'int'

This happens because transformers.Trainer does `args.num_train_epochs > 0`
in its __init__ which fails when the value is None.

This fix converts None to 3.0 (the default) before Trainer initialization.
The actual training duration is still controlled by max_steps since it
takes precedence when both are set.

Example that now works:
```python
config = GRPOConfig(
    num_train_epochs=None,  # Previously caused TypeError
    max_steps=500,          # This controls actual duration
    ...
)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant