upgrade transformers to 5.4.0 by winglian · Pull Request #3562 · axolotl-ai-cloud/axolotl

winglian · 2026-03-31T00:44:34Z

Summary by CodeRabbit

Chores
- Updated core library dependency to a newer version for improved stability and performance.

coderabbitai · 2026-03-31T00:44:51Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb38bb7d-4fc4-42b2-8aa4-8ee173e3f1a8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Updated the pinned version of the transformers package in requirements.txt from 5.3.0 to 5.4.0, modifying only this single dependency constraint.

Changes

Cohort / File(s)	Summary
Dependency Version Update `requirements.txt`	Bumped `transformers` package version from `5.3.0` to `5.4.0`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

#3361 — Updates the transformers dependency version in requirements.txt with similar version constraint modifications.
#3459 — Sets transformers to 5.3.0 in requirements.txt, which is the exact version bumped in this PR.
#3358 — Updates transformers dependency pinning in requirements.txt alongside other package version changes.

Suggested reviewers

NanoCode012
SalmanMohammadi

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'upgrade transformers to 5.4.0' directly and clearly describes the main change: updating a specific package version in requirements.txt.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch transformers-540

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@requirements.txt`:
- Line 15: Wrap the GradientCheckpointingLayer import in
activation_checkpointing.py with a try/except ImportError that mirrors the guard
in monkeypatch/gradient_checkpointing (__init__.py): first try importing from
the main transformers export, then fall back to importing from
transformers.modeling_layers, and if both fail set GradientCheckpointingLayer =
None (or raise a clear error/log) so code using GradientCheckpointingLayer
checks for None. Add a similar multi-path try/except fallback for obtaining
_flash_attention_forward in batch.py (follow the same three-layer fallback
pattern used by _flash_supports_window) so the code degrades gracefully if the
private symbol moved/was removed. Finally, guard any modifications to
ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a
version gate using transformers.__version__ parsed via packaging.version.parse)
so monkey-patches only apply when the expected private API symbols exist and the
transformers version is compatible.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 20ff2ce8-ff07-4235-89d9-da14f748a5a7

📥 Commits

Reviewing files that changed from the base of the PR and between bb622b8 and 3143392.

📒 Files selected for processing (1)

requirements.txt

coderabbitai · 2026-03-31T00:49:15Z

 peft>=0.18.1
 tokenizers>=0.22.1
-transformers==5.3.0
+transformers==5.4.0


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

transformers 5.4.0 release notes breaking changes from 5.3.0 Hugging Face

💡 Result:

Hugging Face Transformers v5.4.0 was released on 2026-03-27. The full release notes are available at https://github.com/huggingface/transformers/releases/tag/v5.4.0. Breaking changes in v5.4.0 (specific to this minor release from v5.3.0): - Refactor Image Processors to support different backends (PR #43514): The dual BaseImageProcessor/BaseImageProcessorFast design has been replaced with a unified backend architecture. The image_processing_utils_fast module has been removed. Users should migrate to the new unified ImageProcessor. - Validate config attributes (PR #41250): Added validation for PretrainedConfig attributes, which may cause type checking failures in some cases (see issue #45071). - Initial support for FA4 with breaking changes (PR #42435). - Changes to weight tying behavior. Note: Many broader v5.x breaking changes (e.g., dynamic weight loading, tokenization simplifications, removal of TF/JAX support) were introduced in v5.0.0. For migration from v4.x, refer to MIGRATION_GUIDE_V5.md. Recent issues reported post-v5.4.0 release include breaks for models using remote_code (issue #45020) and PretrainedConfig type checking (issue #45071), dated around 2026-03-28.

Citations:

1: https://newreleases.io/project/github/huggingface/transformers/release/v5.4.0

2: https://github.com/huggingface/transformers/releases

3: https://pypi.org/project/transformers/5.4.0/

4: https://pypi.org/project/transformers/

5: v5.4.0 breaks PretrainedConfig type checking huggingface/transformers#45071

6: Recent transformers versions break models using remote_code huggingface/transformers#45020

🏁 Script executed:

#!/bin/bash # Description: Verify that all private/internal transformers APIs used in monkey-patches are still accessible in the codebase echo "=== Checking imports of private transformers APIs ===" # Check _flash_attention_forward usage echo -e "\n1. Checking _flash_attention_forward usage:" rg -n "modeling_flash_attention_utils\._flash_attention_forward" --type=py -C2 # Check _flash_supports_window imports with fallbacks echo -e "\n2. Checking _flash_supports_window imports:" rg -n "_flash_supports_window" --type=py -C2 # Check ALL_ATTENTION_FUNCTIONS usage echo -e "\n3. Checking ALL_ATTENTION_FUNCTIONS registry modifications:" rg -n "ALL_ATTENTION_FUNCTIONS" --type=py -C3 # Check GradientCheckpointingLayer with version gates echo -e "\n4. Checking GradientCheckpointingLayer imports:" rg -n "GradientCheckpointingLayer" --type=py -C3 # Find all transformers version checks echo -e "\n5. Finding all transformers version boundary checks:" rg -n "transformers_version|version.*transformers" --type=py -C2

Repository: axolotl-ai-cloud/axolotl

Length of output: 21244

🏁 Script executed:

#!/bin/bash # Description: Check the signature and implementation of the function that replaces _flash_attention_forward echo "=== Checking create_flash_attn_forward_varlen_llama3 implementation ===" ast-grep --pattern $'def create_flash_attn_forward_varlen_llama3($$$) { $$$ }'

Repository: axolotl-ai-cloud/axolotl

Length of output: 139

🏁 Script executed:

#!/bin/bash # Search for create_flash_attn_forward_varlen_llama3 function definition using proper Python pattern echo "=== Searching for create_flash_attn_forward_varlen_llama3 definition ===" rg -n "def create_flash_attn_forward_varlen_llama3" --type=py -A 20 echo -e "\n=== Searching for check_params function ===" rg -n "def check_params" --type=py -A 10 echo -e "\n=== Checking activation_checkpointing.py for GradientCheckpointingLayer import ===" rg -n "GradientCheckpointingLayer" src/axolotl/core/trainers/mixins/activation_checkpointing.py -B 5 -A 5 echo -e "\n=== Checking if there are any try/except blocks around GradientCheckpointingLayer import in activation_checkpointing.py ===" head -30 src/axolotl/core/trainers/mixins/activation_checkpointing.py

Repository: axolotl-ai-cloud/axolotl

Length of output: 3779

🏁 Script executed:

#!/bin/bash # Check what check_params is - it might be builtin or from somewhere echo "=== Checking context around line 176 of batch.py ===" sed -n '170,180p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py echo -e "\n=== Searching for check_params import or definition ===" rg -n "check_params" --type=py -B 2 -A 2 echo -e "\n=== Getting full signature of returned _flash_attention_forward ===" sed -n '61,165p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py | head -50 echo -e "\n=== Checking if GradientCheckpointingLayer is exported from transformers main module ===" rg -n "GradientCheckpointingLayer" --type=py | grep -E "(from transformers import|^src/axolotl/.*:.*from transformers.*GradientCheckpointingLayer)"

Repository: axolotl-ai-cloud/axolotl

Length of output: 4565

🏁 Script executed:

#!/bin/bash # Critical verification: Check if GradientCheckpointingLayer is available via both import paths echo "=== Checking if GradientCheckpointingLayer is exported from transformers main __init__ ===" rg -n "GradientCheckpointingLayer" . --type=py | grep -v ".pyc" | grep -v test | head -20 echo -e "\n=== Checking transformers.modeling_layers path in imports ===" rg -n "transformers.modeling_layers" --type=py echo -e "\n=== Checking if there are any compatibility tests or guards around GradientCheckpointingLayer usage ===" rg -n "GradientCheckpointingLayer" -A 2 -B 2 --type=py | grep -E "(try|except|ImportError|version|if)" -B 2 -A 2 echo -e "\n=== Check recent commits or issues related to GradientCheckpointingLayer in axolotl ===" git log --oneline -20 --all -- "*GradientCheckpointingLayer*" 2>/dev/null || echo "Git log not available" echo -e "\n=== Check if there's any conditional import or error handling in activation_checkpointing.py ===" head -60 src/axolotl/core/trainers/mixins/activation_checkpointing.py

Repository: axolotl-ai-cloud/axolotl

Length of output: 4092

🏁 Script executed:

#!/bin/bash # Verify if GradientCheckpointingLayer is exported from transformers main module echo "=== Checking if transformers exports GradientCheckpointingLayer from main module ===" python3 << 'EOF' import sys try: from transformers import GradientCheckpointingLayer print("✓ GradientCheckpointingLayer IS exported from transformers main module") except ImportError as e: print(f"✗ GradientCheckpointingLayer NOT exported from transformers: {e}") try: from transformers.modeling_layers import GradientCheckpointingLayer print("✓ GradientCheckpointingLayer IS available from transformers.modeling_layers") except ImportError as e: print(f"✗ GradientCheckpointingLayer NOT available from transformers.modeling_layers: {e}") EOF echo -e "\n=== Checking if _flash_attention_forward exists and is accessible ===" python3 << 'EOF' try: import transformers.modeling_flash_attention_utils if hasattr(transformers.modeling_flash_attention_utils, '_flash_attention_forward'): print("✓ _flash_attention_forward exists in transformers.modeling_flash_attention_utils") else: print("✗ _flash_attention_forward NOT FOUND in transformers.modeling_flash_attention_utils") except Exception as e: print(f"✗ Error accessing _flash_attention_forward: {e}") EOF

Repository: axolotl-ai-cloud/axolotl

Length of output: 499

Verify compatibility of monkey-patched private transformers APIs with 5.4.0 and fix unguarded GradientCheckpointingLayer import.

The codebase extensively monkey-patches and imports private/internal transformers APIs that have no stability guarantee across versions. Upgrading from 5.3.0 to 5.4.0 risks breaking these integrations. Critical concerns:

Unguarded GradientCheckpointingLayer import: src/axolotl/core/trainers/mixins/activation_checkpointing.py:13 imports from transformers main module without version gating, while src/axolotl/monkeypatch/gradient_checkpointing/__init__.py:17 guards it with > 4.51.3 and imports from transformers.modeling_layers. These differ in both path and gating—if transformers doesn't re-export from the main module, this import will fail at runtime.

Inconsistent fallback patterns: _flash_supports_window has a three-layer try/except fallback in batch.py:19-30, but _flash_attention_forward (accessed at batch.py:170, 177) has no fallback. If transformers 5.4.0 changed these private APIs, only the former would gracefully degrade.

Private API registry modifications: ALL_ATTENTION_FUNCTIONS is modified at multiple sites without version checks or fallbacks.

Please add try/except protection to the GradientCheckpointingLayer import in activation_checkpointing.py, add a fallback for _flash_attention_forward import, and confirm all monkey-patched paths still exist in transformers 5.4.0.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements.txt` at line 15, Wrap the GradientCheckpointingLayer import in activation_checkpointing.py with a try/except ImportError that mirrors the guard in monkeypatch/gradient_checkpointing (__init__.py): first try importing from the main transformers export, then fall back to importing from transformers.modeling_layers, and if both fail set GradientCheckpointingLayer = None (or raise a clear error/log) so code using GradientCheckpointingLayer checks for None. Add a similar multi-path try/except fallback for obtaining _flash_attention_forward in batch.py (follow the same three-layer fallback pattern used by _flash_supports_window) so the code degrades gracefully if the private symbol moved/was removed. Finally, guard any modifications to ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a version gate using transformers.__version__ parsed via packaging.version.parse) so monkey-patches only apply when the expected private API symbols exist and the transformers version is compatible.

codecov · 2026-03-31T03:32:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

upgrade transformers to 5.4.0

3143392

coderabbitai Bot reviewed Mar 31, 2026

View reviewed changes

allow fail for tests requiring phi3 tokenizer

381c09b

winglian added 2 commits March 31, 2026 01:32

ring-flash-attn skips

2622876

skip tests for now

855e4f8

winglian merged commit 5e5603c into main Mar 31, 2026
30 of 31 checks passed

winglian deleted the transformers-540 branch March 31, 2026 23:16

coderabbitai Bot mentioned this pull request Apr 10, 2026

upgrade transformers to use v5.5.3 #3593

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upgrade transformers to 5.4.0#3562

upgrade transformers to 5.4.0#3562
winglian merged 4 commits into
mainfrom
transformers-540

winglian commented Mar 31, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Uh oh!

codecov Bot commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

winglian commented Mar 31, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Mar 31, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

winglian commented Mar 31, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 31, 2026 •

edited

Loading