Fix num_train_epochs=None causing TypeError in GRPOConfig by danielhanchen · Pull Request #3972 · unslothai/unsloth

danielhanchen · 2026-02-03T10:36:22Z

Summary

Fix TypeError when num_train_epochs=None is passed to GRPOConfig
Converts None to 3.0 (the default) before Trainer init (max_steps still controls actual duration)

Problem

When users pass num_train_epochs=None to GRPOConfig, the Trainer initialization fails with:

TypeError: '>' not supported between instances of 'NoneType' and 'int'

This happens at transformers/training_args.py:290 where Trainer does:

if args.num_train_epochs > 0:  # Fails when None

Solution

Add a check in the generated RLConfig code to convert None to 3.0:

if num_train_epochs is None:
    num_train_epochs = 3.0  # Default to 3 epochs if None, max_steps will override

The actual training duration is still controlled by max_steps since it takes precedence when both are set.

Test

from trl import GRPOConfig, GRPOTrainer

config = GRPOConfig(
    num_train_epochs=None,  # Previously caused TypeError
    max_steps=500,          # This controls actual duration
    ...
)
# Now works - num_train_epochs converted to 3.0, training runs for 500 steps

Tested with TRL 0.27.1 and Unsloth main.

gemini-code-assist · 2026-02-03T10:36:36Z

Summary of Changes

Hello @danielhanchen, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug where the Trainer initialization would fail with a TypeError if num_train_epochs was explicitly set to None within the GRPOConfig. The solution involves a pre-emptive conversion of num_train_epochs from None to 1 within the generated RLConfig code, ensuring that the Trainer's internal checks pass without error. This change maintains the intended behavior where max_steps dictates the training length when both parameters are provided.

Highlights

Bug Fix: Resolves a TypeError that occurred when num_train_epochs=None was passed to GRPOConfig during Trainer initialization.
Type Handling: Introduces a check to convert num_train_epochs from None to 1 before the Trainer is initialized, preventing the type comparison error.
Training Duration: Confirms that max_steps will still take precedence and control the actual training duration when both num_train_epochs and max_steps are set.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth/models/rl.py
- Added a conditional check for num_train_epochs within the _patch_trl_rl_trainers function.
- If num_train_epochs is None, it is now explicitly set to 1 to avoid a TypeError during Trainer instantiation.

Activity

The pull request was authored by danielhanchen.
The author provided a clear summary of the problem, detailing the TypeError encountered when num_train_epochs=None is used.
The proposed solution, involving a None to 1 conversion for num_train_epochs, is well-explained.
A test case demonstrating the fix with GRPOConfig and GRPOTrainer is included, confirming that the issue is resolved and max_steps still controls duration.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6f335f3f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-03T10:37:37Z

+    if "num_train_epochs" in call_args:
+        num_train_epochs_check = (
+            "if num_train_epochs is None:\n"
+            "    num_train_epochs = 1  # Default to 1 epoch if None, max_steps will override\n"


Gate None→1 epoch fallback on max_steps

This always forces num_train_epochs=None to 1, even when max_steps is not set. In that case training will silently run for a single epoch instead of the default TrainingArguments value (or the previous TypeError), so users who pass None expecting the default or an explicit failure will now get a shortened run. The commit message says “max_steps will override,” but this code doesn’t check that condition. Consider only applying the fallback when max_steps is provided (or using the Trainer default) to avoid unexpected 1‑epoch training.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request addresses a TypeError when num_train_epochs=None is used with GRPOConfig. The fix correctly defaults num_train_epochs to 1 in this case, preventing the error while allowing max_steps to control the training duration as intended. The implementation is consistent with the existing code-patching style in the file. The change is well-scoped and effectively resolves the issue.

When users pass `num_train_epochs=None` to GRPOConfig (relying on max_steps to control training duration), Trainer.__init__ fails with: TypeError: '>' not supported between instances of 'NoneType' and 'int' This happens because transformers.Trainer does `args.num_train_epochs > 0` in its __init__ which fails when the value is None. This fix converts None to 3.0 (the default) before Trainer initialization. The actual training duration is still controlled by max_steps since it takes precedence when both are set. Example that now works: ```python config = GRPOConfig( num_train_epochs=None, # Previously caused TypeError max_steps=500, # This controls actual duration ... ) ```

…3972) When users pass `num_train_epochs=None` to GRPOConfig (relying on max_steps to control training duration), Trainer.__init__ fails with: TypeError: '>' not supported between instances of 'NoneType' and 'int' This happens because transformers.Trainer does `args.num_train_epochs > 0` in its __init__ which fails when the value is None. This fix converts None to 3.0 (the default) before Trainer initialization. The actual training duration is still controlled by max_steps since it takes precedence when both are set. Example that now works: ```python config = GRPOConfig( num_train_epochs=None, # Previously caused TypeError max_steps=500, # This controls actual duration ... ) ```

chatgpt-codex-connector Bot reviewed Feb 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Feb 3, 2026

View reviewed changes

danielhanchen force-pushed the fix-num-train-epochs-none branch from e6f335f to 2143f43 Compare February 3, 2026 10:45

danielhanchen merged commit 02afd40 into main Feb 3, 2026
4 checks passed

danielhanchen deleted the fix-num-train-epochs-none branch February 3, 2026 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix num_train_epochs=None causing TypeError in GRPOConfig#3972

Fix num_train_epochs=None causing TypeError in GRPOConfig#3972
danielhanchen merged 1 commit into
mainfrom
fix-num-train-epochs-none

danielhanchen commented Feb 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Feb 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

danielhanchen commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Test

Uh oh!

gemini-code-assist Bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

danielhanchen commented Feb 3, 2026 •

edited

Loading