Skip to content

upgrade transformers to 5.4.0#3562

Merged
winglian merged 4 commits into
mainfrom
transformers-540
Mar 31, 2026
Merged

upgrade transformers to 5.4.0#3562
winglian merged 4 commits into
mainfrom
transformers-540

Conversation

@winglian

@winglian winglian commented Mar 31, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Chores
    • Updated core library dependency to a newer version for improved stability and performance.

@coderabbitai

coderabbitai Bot commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb38bb7d-4fc4-42b2-8aa4-8ee173e3f1a8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Updated the pinned version of the transformers package in requirements.txt from 5.3.0 to 5.4.0, modifying only this single dependency constraint.

Changes

Cohort / File(s) Summary
Dependency Version Update
requirements.txt
Bumped transformers package version from 5.3.0 to 5.4.0.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

  • #3361 — Updates the transformers dependency version in requirements.txt with similar version constraint modifications.
  • #3459 — Sets transformers to 5.3.0 in requirements.txt, which is the exact version bumped in this PR.
  • #3358 — Updates transformers dependency pinning in requirements.txt alongside other package version changes.

Suggested reviewers

  • NanoCode012
  • SalmanMohammadi
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'upgrade transformers to 5.4.0' directly and clearly describes the main change: updating a specific package version in requirements.txt.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch transformers-540

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@requirements.txt`:
- Line 15: Wrap the GradientCheckpointingLayer import in
activation_checkpointing.py with a try/except ImportError that mirrors the guard
in monkeypatch/gradient_checkpointing (__init__.py): first try importing from
the main transformers export, then fall back to importing from
transformers.modeling_layers, and if both fail set GradientCheckpointingLayer =
None (or raise a clear error/log) so code using GradientCheckpointingLayer
checks for None. Add a similar multi-path try/except fallback for obtaining
_flash_attention_forward in batch.py (follow the same three-layer fallback
pattern used by _flash_supports_window) so the code degrades gracefully if the
private symbol moved/was removed. Finally, guard any modifications to
ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a
version gate using transformers.__version__ parsed via packaging.version.parse)
so monkey-patches only apply when the expected private API symbols exist and the
transformers version is compatible.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 20ff2ce8-ff07-4235-89d9-da14f748a5a7

📥 Commits

Reviewing files that changed from the base of the PR and between bb622b8 and 3143392.

📒 Files selected for processing (1)
  • requirements.txt

Comment thread requirements.txt
peft>=0.18.1
tokenizers>=0.22.1
transformers==5.3.0
transformers==5.4.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

transformers 5.4.0 release notes breaking changes from 5.3.0 Hugging Face

💡 Result:

Hugging Face Transformers v5.4.0 was released on 2026-03-27. The full release notes are available at https://github.com/huggingface/transformers/releases/tag/v5.4.0. Breaking changes in v5.4.0 (specific to this minor release from v5.3.0): - Refactor Image Processors to support different backends (PR #43514): The dual BaseImageProcessor/BaseImageProcessorFast design has been replaced with a unified backend architecture. The image_processing_utils_fast module has been removed. Users should migrate to the new unified ImageProcessor. - Validate config attributes (PR #41250): Added validation for PretrainedConfig attributes, which may cause type checking failures in some cases (see issue #45071). - Initial support for FA4 with breaking changes (PR #42435). - Changes to weight tying behavior. Note: Many broader v5.x breaking changes (e.g., dynamic weight loading, tokenization simplifications, removal of TF/JAX support) were introduced in v5.0.0. For migration from v4.x, refer to MIGRATION_GUIDE_V5.md. Recent issues reported post-v5.4.0 release include breaks for models using remote_code (issue #45020) and PretrainedConfig type checking (issue #45071), dated around 2026-03-28.

Citations:


🏁 Script executed:

#!/bin/bash
# Description: Verify that all private/internal transformers APIs used in monkey-patches are still accessible in the codebase

echo "=== Checking imports of private transformers APIs ==="

# Check _flash_attention_forward usage
echo -e "\n1. Checking _flash_attention_forward usage:"
rg -n "modeling_flash_attention_utils\._flash_attention_forward" --type=py -C2

# Check _flash_supports_window imports with fallbacks
echo -e "\n2. Checking _flash_supports_window imports:"
rg -n "_flash_supports_window" --type=py -C2

# Check ALL_ATTENTION_FUNCTIONS usage
echo -e "\n3. Checking ALL_ATTENTION_FUNCTIONS registry modifications:"
rg -n "ALL_ATTENTION_FUNCTIONS" --type=py -C3

# Check GradientCheckpointingLayer with version gates
echo -e "\n4. Checking GradientCheckpointingLayer imports:"
rg -n "GradientCheckpointingLayer" --type=py -C3

# Find all transformers version checks
echo -e "\n5. Finding all transformers version boundary checks:"
rg -n "transformers_version|version.*transformers" --type=py -C2

Repository: axolotl-ai-cloud/axolotl

Length of output: 21244


🏁 Script executed:

#!/bin/bash
# Description: Check the signature and implementation of the function that replaces _flash_attention_forward

echo "=== Checking create_flash_attn_forward_varlen_llama3 implementation ==="
ast-grep --pattern $'def create_flash_attn_forward_varlen_llama3($$$) {
  $$$
}'

Repository: axolotl-ai-cloud/axolotl

Length of output: 139


🏁 Script executed:

#!/bin/bash
# Search for create_flash_attn_forward_varlen_llama3 function definition using proper Python pattern
echo "=== Searching for create_flash_attn_forward_varlen_llama3 definition ==="
rg -n "def create_flash_attn_forward_varlen_llama3" --type=py -A 20

echo -e "\n=== Searching for check_params function ==="
rg -n "def check_params" --type=py -A 10

echo -e "\n=== Checking activation_checkpointing.py for GradientCheckpointingLayer import ==="
rg -n "GradientCheckpointingLayer" src/axolotl/core/trainers/mixins/activation_checkpointing.py -B 5 -A 5

echo -e "\n=== Checking if there are any try/except blocks around GradientCheckpointingLayer import in activation_checkpointing.py ==="
head -30 src/axolotl/core/trainers/mixins/activation_checkpointing.py

Repository: axolotl-ai-cloud/axolotl

Length of output: 3779


🏁 Script executed:

#!/bin/bash
# Check what check_params is - it might be builtin or from somewhere
echo "=== Checking context around line 176 of batch.py ==="
sed -n '170,180p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py

echo -e "\n=== Searching for check_params import or definition ==="
rg -n "check_params" --type=py -B 2 -A 2

echo -e "\n=== Getting full signature of returned _flash_attention_forward ==="
sed -n '61,165p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py | head -50

echo -e "\n=== Checking if GradientCheckpointingLayer is exported from transformers main module ==="
rg -n "GradientCheckpointingLayer" --type=py | grep -E "(from transformers import|^src/axolotl/.*:.*from transformers.*GradientCheckpointingLayer)"

Repository: axolotl-ai-cloud/axolotl

Length of output: 4565


🏁 Script executed:

#!/bin/bash
# Critical verification: Check if GradientCheckpointingLayer is available via both import paths
echo "=== Checking if GradientCheckpointingLayer is exported from transformers main __init__ ==="
rg -n "GradientCheckpointingLayer" . --type=py | grep -v ".pyc" | grep -v test | head -20

echo -e "\n=== Checking transformers.modeling_layers path in imports ==="
rg -n "transformers.modeling_layers" --type=py

echo -e "\n=== Checking if there are any compatibility tests or guards around GradientCheckpointingLayer usage ==="
rg -n "GradientCheckpointingLayer" -A 2 -B 2 --type=py | grep -E "(try|except|ImportError|version|if)" -B 2 -A 2

echo -e "\n=== Check recent commits or issues related to GradientCheckpointingLayer in axolotl ==="
git log --oneline -20 --all -- "*GradientCheckpointingLayer*" 2>/dev/null || echo "Git log not available"

echo -e "\n=== Check if there's any conditional import or error handling in activation_checkpointing.py ==="
head -60 src/axolotl/core/trainers/mixins/activation_checkpointing.py

Repository: axolotl-ai-cloud/axolotl

Length of output: 4092


🏁 Script executed:

#!/bin/bash
# Verify if GradientCheckpointingLayer is exported from transformers main module
echo "=== Checking if transformers exports GradientCheckpointingLayer from main module ==="
python3 << 'EOF'
import sys
try:
    from transformers import GradientCheckpointingLayer
    print("✓ GradientCheckpointingLayer IS exported from transformers main module")
except ImportError as e:
    print(f"✗ GradientCheckpointingLayer NOT exported from transformers: {e}")

try:
    from transformers.modeling_layers import GradientCheckpointingLayer
    print("✓ GradientCheckpointingLayer IS available from transformers.modeling_layers")
except ImportError as e:
    print(f"✗ GradientCheckpointingLayer NOT available from transformers.modeling_layers: {e}")
EOF

echo -e "\n=== Checking if _flash_attention_forward exists and is accessible ==="
python3 << 'EOF'
try:
    import transformers.modeling_flash_attention_utils
    if hasattr(transformers.modeling_flash_attention_utils, '_flash_attention_forward'):
        print("✓ _flash_attention_forward exists in transformers.modeling_flash_attention_utils")
    else:
        print("✗ _flash_attention_forward NOT FOUND in transformers.modeling_flash_attention_utils")
except Exception as e:
    print(f"✗ Error accessing _flash_attention_forward: {e}")
EOF

Repository: axolotl-ai-cloud/axolotl

Length of output: 499


Verify compatibility of monkey-patched private transformers APIs with 5.4.0 and fix unguarded GradientCheckpointingLayer import.

The codebase extensively monkey-patches and imports private/internal transformers APIs that have no stability guarantee across versions. Upgrading from 5.3.0 to 5.4.0 risks breaking these integrations. Critical concerns:

  1. Unguarded GradientCheckpointingLayer import: src/axolotl/core/trainers/mixins/activation_checkpointing.py:13 imports from transformers main module without version gating, while src/axolotl/monkeypatch/gradient_checkpointing/__init__.py:17 guards it with > 4.51.3 and imports from transformers.modeling_layers. These differ in both path and gating—if transformers doesn't re-export from the main module, this import will fail at runtime.

  2. Inconsistent fallback patterns: _flash_supports_window has a three-layer try/except fallback in batch.py:19-30, but _flash_attention_forward (accessed at batch.py:170, 177) has no fallback. If transformers 5.4.0 changed these private APIs, only the former would gracefully degrade.

  3. Private API registry modifications: ALL_ATTENTION_FUNCTIONS is modified at multiple sites without version checks or fallbacks.

Please add try/except protection to the GradientCheckpointingLayer import in activation_checkpointing.py, add a fallback for _flash_attention_forward import, and confirm all monkey-patched paths still exist in transformers 5.4.0.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` at line 15, Wrap the GradientCheckpointingLayer import in
activation_checkpointing.py with a try/except ImportError that mirrors the guard
in monkeypatch/gradient_checkpointing (__init__.py): first try importing from
the main transformers export, then fall back to importing from
transformers.modeling_layers, and if both fail set GradientCheckpointingLayer =
None (or raise a clear error/log) so code using GradientCheckpointingLayer
checks for None. Add a similar multi-path try/except fallback for obtaining
_flash_attention_forward in batch.py (follow the same three-layer fallback
pattern used by _flash_supports_window) so the code degrades gracefully if the
private symbol moved/was removed. Finally, guard any modifications to
ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a
version gate using transformers.__version__ parsed via packaging.version.parse)
so monkey-patches only apply when the expected private API symbols exist and the
transformers version is compatible.

@codecov

codecov Bot commented Mar 31, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@winglian winglian merged commit 5e5603c into main Mar 31, 2026
30 of 31 checks passed
@winglian winglian deleted the transformers-540 branch March 31, 2026 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant