Skip to content

Trl 0.27.0 update#3965

Merged
danielhanchen merged 18 commits into
unslothai:mainfrom
pluesclues:trl_0.27.0_update
Feb 5, 2026
Merged

Trl 0.27.0 update#3965
danielhanchen merged 18 commits into
unslothai:mainfrom
pluesclues:trl_0.27.0_update

Conversation

@pluesclues
Copy link
Copy Markdown
Collaborator

This PR allows for grpo to be compatible with trl 0.27.0 it assumes formulation of notebooks just like the ones in unsloth notebooks repo. Relies on unslothai/unsloth-zoo#457.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @pluesclues, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on updating the Unsloth library to maintain seamless compatibility with the latest versions of the trl library, specifically targeting version 0.27.0. The changes primarily involve adapting Unsloth's internal patching mechanisms to accommodate modifications in trl's gradient checkpointing, PEFT integration, and the way completion data is processed. Additionally, it enhances the loss computation logic for GRPO-based trainers by introducing detailed clipping ratio metrics, which are essential for monitoring and debugging reinforcement learning training processes.

Highlights

  • TRL Version Compatibility: Updates Unsloth's integration with the trl library to ensure compatibility with version 0.27.0 and newer, adapting to recent changes in trl's internal mechanisms.
  • Gradient Checkpointing Logic Adjustment: Modifies the GRPOConfig to remove a specific transformers version check related to gradient checkpointing, aligning with trl's updated handling for versions 0.27.0 and above.
  • PEFT Initialization Handling: Removes TRL's default PEFT initialization logic for trl versions 0.26.0 and higher, allowing Unsloth to manage or bypass this aspect for optimized performance.
  • Enhanced Loss Computation Metrics: Introduces new metrics for tracking clipping ratios (e.g., low_clip, high_clip, clip_ratio, cispo_clip_ratio) within the compute_loss function for GRPO-like algorithms, providing deeper insights into training dynamics.
  • Completion Data Handling Update: Adjusts the _calculate_rewards function to utilize completions_text instead of completions for trl versions 0.25.0 and above, reflecting changes in how TRL passes completion data.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • unsloth/models/rl.py
    • Removed a transformers version check for gradient checkpointing in GRPOConfig when trl version is 0.27.0 or greater.
    • Removed TRL's PEFT initialization logic for trl versions 0.26.0 or greater.
  • unsloth/models/rl_replacements.py
    • Added import for trl.__version__ and implemented robust trl version detection using Version objects.
    • Modified grpo_trainer__generate_and_score_completions to use completions_text for _calculate_rewards when trl version is 0.25.0 or greater.
    • Updated the return signature of grpo_compute_loss_slow and grpo_accumulated_loss to include an additional coef_1 value.
    • Introduced new metrics for tracking clipping ratios (clip_ratio/low_mean, clip_ratio/high_mean, clip_ratio/region_mean, cispo_clip_ratio) within the compute_loss function for various loss types.
Activity
  • No activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e09f7165d7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 1002 to 1004
# to ensure backwards compatibility with trl 0.15.2 and maybe even 0.17
loss, completion_length, mean_kl = grpo_accumulated_loss(
loss, completion_length, mean_kl, coef_1 = grpo_accumulated_loss(
trainer = self,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve backward-compatibility return arity

The backward-compatibility branch now unpacks four values from grpo_accumulated_loss, but that path exists specifically for older TRL versions (“0.15.2 and maybe even 0.17”) where the function historically returned only three values. In those environments, this will raise ValueError: not enough values to unpack at runtime, defeating the compatibility fallback. Consider guarding on the returned tuple length or only unpacking coef_1 when the underlying implementation provides it.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates unsloth to be compatible with trl version 0.27.0. The changes primarily involve monkey-patching TRL's trainers (GRPOTrainer) and configuration classes to handle API changes and remove conflicting logic, especially for newer TRL versions. The modifications are done through string manipulation of source code, which is consistent with the existing patching mechanism in the codebase.

My review found a critical runtime error due to the use of undefined functions (nanmin, nanmax) and a medium-severity issue related to code style and robustness in version detection logic. I've provided suggestions to fix these issues.

Comment on lines +1101 to +1104

gathered_low_clip = self.accelerator.gather(low_clip)
self._metrics[mode]["clip_ratio/low_mean"].append(
gathered_low_clip.nanmean().item()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The functions nanmin and nanmax are used here without being defined or imported. This will lead to a NameError at runtime. You should use torch.nanmin and torch.nanmax instead, as torch is available in the execution context of this code.

Suggested change
gathered_low_clip = self.accelerator.gather(low_clip)
self._metrics[mode]["clip_ratio/low_mean"].append(
gathered_low_clip.nanmean().item()
self._metrics[mode]["clip_ratio/low_min"].append(torch.nanmin(gathered_low_clip).item())
gathered_high_clip = self.accelerator.gather(high_clip)
self._metrics[mode]["clip_ratio/high_mean"].append(gathered_high_clip.nanmean().item())
self._metrics[mode]["clip_ratio/high_max"].append(torch.nanmax(gathered_high_clip).item())

Comment on lines +60 to +66
try:
trl_version = Version(trl_version_raw)
except Exception:
try:
trl_version = Version(importlib_version("trl"))
except Exception:
trl_version = Version("0.0.0")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a broad except Exception: can hide unexpected errors and make debugging harder. It's better to catch more specific exceptions. For example, Version() might raise a ValueError, and importlib_version() can raise PackageNotFoundError (which you'd need to import). Using more specific exceptions would make this code more robust and maintainable.

@danielhanchen danielhanchen merged commit 3be666e into unslothai:main Feb 5, 2026
1 check passed
abiswas-realadvice pushed a commit to abiswas-realadvice/unsloth that referenced this pull request May 14, 2026
* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl.py

* Update rl.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update rl_replacements.py

* Update rl.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update rl_replacements.py, remove chat template from codexes commits

* Update rl.py, got rid of gradient checkpointing code that did not work

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants