Trl 0.27.0 update by pluesclues · Pull Request #3965 · unslothai/unsloth

pluesclues · 2026-02-03T03:43:42Z

This PR allows for grpo to be compatible with trl 0.27.0 it assumes formulation of notebooks just like the ones in unsloth notebooks repo. Relies on unslothai/unsloth-zoo#457.

gemini-code-assist · 2026-02-03T03:44:02Z

Summary of Changes

Hello @pluesclues, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on updating the Unsloth library to maintain seamless compatibility with the latest versions of the trl library, specifically targeting version 0.27.0. The changes primarily involve adapting Unsloth's internal patching mechanisms to accommodate modifications in trl's gradient checkpointing, PEFT integration, and the way completion data is processed. Additionally, it enhances the loss computation logic for GRPO-based trainers by introducing detailed clipping ratio metrics, which are essential for monitoring and debugging reinforcement learning training processes.

Highlights

TRL Version Compatibility: Updates Unsloth's integration with the trl library to ensure compatibility with version 0.27.0 and newer, adapting to recent changes in trl's internal mechanisms.
Gradient Checkpointing Logic Adjustment: Modifies the GRPOConfig to remove a specific transformers version check related to gradient checkpointing, aligning with trl's updated handling for versions 0.27.0 and above.
PEFT Initialization Handling: Removes TRL's default PEFT initialization logic for trl versions 0.26.0 and higher, allowing Unsloth to manage or bypass this aspect for optimized performance.
Enhanced Loss Computation Metrics: Introduces new metrics for tracking clipping ratios (e.g., low_clip, high_clip, clip_ratio, cispo_clip_ratio) within the compute_loss function for GRPO-like algorithms, providing deeper insights into training dynamics.
Completion Data Handling Update: Adjusts the _calculate_rewards function to utilize completions_text instead of completions for trl versions 0.25.0 and above, reflecting changes in how TRL passes completion data.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

unsloth/models/rl.py
- Removed a transformers version check for gradient checkpointing in GRPOConfig when trl version is 0.27.0 or greater.
- Removed TRL's PEFT initialization logic for trl versions 0.26.0 or greater.
unsloth/models/rl_replacements.py
- Added import for trl.__version__ and implemented robust trl version detection using Version objects.
- Modified grpo_trainer__generate_and_score_completions to use completions_text for _calculate_rewards when trl version is 0.25.0 or greater.
- Updated the return signature of grpo_compute_loss_slow and grpo_accumulated_loss to include an additional coef_1 value.
- Introduced new metrics for tracking clipping ratios (clip_ratio/low_mean, clip_ratio/high_mean, clip_ratio/region_mean, cispo_clip_ratio) within the compute_loss function for various loss types.

Activity

No activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

for more information, see https://pre-commit.ci

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e09f7165d7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-03T03:45:12Z

                # to ensure backwards compatibility with trl 0.15.2 and maybe even 0.17
-                loss, completion_length, mean_kl = grpo_accumulated_loss(
+                loss, completion_length, mean_kl, coef_1 = grpo_accumulated_loss(
                    trainer = self,


Preserve backward-compatibility return arity

The backward-compatibility branch now unpacks four values from grpo_accumulated_loss, but that path exists specifically for older TRL versions (“0.15.2 and maybe even 0.17”) where the function historically returned only three values. In those environments, this will raise ValueError: not enough values to unpack at runtime, defeating the compatibility fallback. Consider guarding on the returned tuple length or only unpacking coef_1 when the underlying implementation provides it.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request updates unsloth to be compatible with trl version 0.27.0. The changes primarily involve monkey-patching TRL's trainers (GRPOTrainer) and configuration classes to handle API changes and remove conflicting logic, especially for newer TRL versions. The modifications are done through string manipulation of source code, which is consistent with the existing patching mechanism in the codebase.

My review found a critical runtime error due to the use of undefined functions (nanmin, nanmax) and a medium-severity issue related to code style and robustness in version detection logic. I've provided suggestions to fix these issues.

gemini-code-assist · 2026-02-03T03:45:47Z

+
+            gathered_low_clip = self.accelerator.gather(low_clip)
+            self._metrics[mode]["clip_ratio/low_mean"].append(
+                gathered_low_clip.nanmean().item()


The functions nanmin and nanmax are used here without being defined or imported. This will lead to a NameError at runtime. You should use torch.nanmin and torch.nanmax instead, as torch is available in the execution context of this code.

Suggested change

gathered_low_clip = self.accelerator.gather(low_clip)

self._metrics[mode]["clip_ratio/low_mean"].append(

gathered_low_clip.nanmean().item()

self._metrics[mode]["clip_ratio/low_min"].append(torch.nanmin(gathered_low_clip).item())

gathered_high_clip = self.accelerator.gather(high_clip)

self._metrics[mode]["clip_ratio/high_mean"].append(gathered_high_clip.nanmean().item())

self._metrics[mode]["clip_ratio/high_max"].append(torch.nanmax(gathered_high_clip).item())

gemini-code-assist · 2026-02-03T03:45:47Z

+try:
+    trl_version = Version(trl_version_raw)
+except Exception:
+    try:
+        trl_version = Version(importlib_version("trl"))
+    except Exception:
+        trl_version = Version("0.0.0")


Using a broad except Exception: can hide unexpected errors and make debugging harder. It's better to catch more specific exceptions. For example, Version() might raise a ValueError, and importlib_version() can raise PackageNotFoundError (which you'd need to import). Using more specific exceptions would make this code more robust and maintainable.

for more information, see https://pre-commit.ci

* Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl_replacements.py * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl_replacements.py, remove chat template from codexes commits * Update rl.py, got rid of gradient checkpointing code that did not work --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

pluesclues added 10 commits January 26, 2026 12:35

Update rl_replacements.py

acf90f0

Merge branch 'unslothai:main' into trl_0.27.0_update

886664e

Update rl_replacements.py

707fa38

Merge branch 'unslothai:main' into trl_0.27.0_update

cf24add

Merge branch 'unslothai:main' into trl_0.27.0_update

37c64bd

Update rl.py

44beb30

Update rl_replacements.py

48c3c19

Update rl_replacements.py

c2a6fc1

Update rl.py

b470807

Update rl.py

e09f716

pluesclues mentioned this pull request Feb 3, 2026

Update unsloth to be compatible with trl 0.27.0 unslothai/unsloth-zoo#457

Merged

[pre-commit.ci] auto fixes from pre-commit.com hooks

58a1fd4

for more information, see https://pre-commit.ci

chatgpt-codex-connector Bot reviewed Feb 3, 2026

View reviewed changes

gemini-code-assist Bot reviewed Feb 3, 2026

View reviewed changes

danielhanchen mentioned this pull request Feb 3, 2026

Add TRL truncation regression and metadata loss fixes (Fixes 1 and 3) #3971

Merged

pluesclues and others added 7 commits February 3, 2026 14:00

Update rl_replacements.py

d8a7c7d

Merge branch 'unslothai:main' into trl_0.27.0_update

f903be4

Update rl.py

e7db06e

[pre-commit.ci] auto fixes from pre-commit.com hooks

832c03b

for more information, see https://pre-commit.ci

Merge branch 'unslothai:main' into trl_0.27.0_update

26977f5

Update rl_replacements.py, remove chat template from codexes commits

fa354fc

Update rl.py, got rid of gradient checkpointing code that did not work

f6002a6

danielhanchen merged commit 3be666e into unslothai:main Feb 5, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trl 0.27.0 update#3965

Trl 0.27.0 update#3965
danielhanchen merged 18 commits into
unslothai:mainfrom
pluesclues:trl_0.27.0_update

pluesclues commented Feb 3, 2026

Uh oh!

gemini-code-assist Bot commented Feb 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Feb 3, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 3, 2026

Uh oh!

gemini-code-assist Bot Feb 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            gathered_low_clip = self.accelerator.gather(low_clip)
-            self._metrics[mode]["clip_ratio/low_mean"].append(
-                gathered_low_clip.nanmean().item()
+            self._metrics[mode]["clip_ratio/low_min"].append(torch.nanmin(gathered_low_clip).item())
+            gathered_high_clip = self.accelerator.gather(high_clip)
+            self._metrics[mode]["clip_ratio/high_mean"].append(gathered_high_clip.nanmean().item())
+            self._metrics[mode]["clip_ratio/high_max"].append(torch.nanmax(gathered_high_clip).item())

Uh oh!

Conversation

pluesclues commented Feb 3, 2026

Uh oh!

gemini-code-assist Bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants