upgrade transformers to 5.4.0#3562
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughUpdated the pinned version of the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@requirements.txt`:
- Line 15: Wrap the GradientCheckpointingLayer import in
activation_checkpointing.py with a try/except ImportError that mirrors the guard
in monkeypatch/gradient_checkpointing (__init__.py): first try importing from
the main transformers export, then fall back to importing from
transformers.modeling_layers, and if both fail set GradientCheckpointingLayer =
None (or raise a clear error/log) so code using GradientCheckpointingLayer
checks for None. Add a similar multi-path try/except fallback for obtaining
_flash_attention_forward in batch.py (follow the same three-layer fallback
pattern used by _flash_supports_window) so the code degrades gracefully if the
private symbol moved/was removed. Finally, guard any modifications to
ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a
version gate using transformers.__version__ parsed via packaging.version.parse)
so monkey-patches only apply when the expected private API symbols exist and the
transformers version is compatible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 20ff2ce8-ff07-4235-89d9-da14f748a5a7
📒 Files selected for processing (1)
requirements.txt
| peft>=0.18.1 | ||
| tokenizers>=0.22.1 | ||
| transformers==5.3.0 | ||
| transformers==5.4.0 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
transformers 5.4.0 release notes breaking changes from 5.3.0 Hugging Face
💡 Result:
Hugging Face Transformers v5.4.0 was released on 2026-03-27. The full release notes are available at https://github.com/huggingface/transformers/releases/tag/v5.4.0. Breaking changes in v5.4.0 (specific to this minor release from v5.3.0): - Refactor Image Processors to support different backends (PR #43514): The dual BaseImageProcessor/BaseImageProcessorFast design has been replaced with a unified backend architecture. The image_processing_utils_fast module has been removed. Users should migrate to the new unified ImageProcessor. - Validate config attributes (PR #41250): Added validation for PretrainedConfig attributes, which may cause type checking failures in some cases (see issue #45071). - Initial support for FA4 with breaking changes (PR #42435). - Changes to weight tying behavior. Note: Many broader v5.x breaking changes (e.g., dynamic weight loading, tokenization simplifications, removal of TF/JAX support) were introduced in v5.0.0. For migration from v4.x, refer to MIGRATION_GUIDE_V5.md. Recent issues reported post-v5.4.0 release include breaks for models using remote_code (issue #45020) and PretrainedConfig type checking (issue #45071), dated around 2026-03-28.
Citations:
- 1: https://newreleases.io/project/github/huggingface/transformers/release/v5.4.0
- 2: https://github.com/huggingface/transformers/releases
- 3: https://pypi.org/project/transformers/5.4.0/
- 4: https://pypi.org/project/transformers/
- 5: v5.4.0 breaks
PretrainedConfigtype checking huggingface/transformers#45071 - 6: Recent transformers versions break models using
remote_codehuggingface/transformers#45020
🏁 Script executed:
#!/bin/bash
# Description: Verify that all private/internal transformers APIs used in monkey-patches are still accessible in the codebase
echo "=== Checking imports of private transformers APIs ==="
# Check _flash_attention_forward usage
echo -e "\n1. Checking _flash_attention_forward usage:"
rg -n "modeling_flash_attention_utils\._flash_attention_forward" --type=py -C2
# Check _flash_supports_window imports with fallbacks
echo -e "\n2. Checking _flash_supports_window imports:"
rg -n "_flash_supports_window" --type=py -C2
# Check ALL_ATTENTION_FUNCTIONS usage
echo -e "\n3. Checking ALL_ATTENTION_FUNCTIONS registry modifications:"
rg -n "ALL_ATTENTION_FUNCTIONS" --type=py -C3
# Check GradientCheckpointingLayer with version gates
echo -e "\n4. Checking GradientCheckpointingLayer imports:"
rg -n "GradientCheckpointingLayer" --type=py -C3
# Find all transformers version checks
echo -e "\n5. Finding all transformers version boundary checks:"
rg -n "transformers_version|version.*transformers" --type=py -C2Repository: axolotl-ai-cloud/axolotl
Length of output: 21244
🏁 Script executed:
#!/bin/bash
# Description: Check the signature and implementation of the function that replaces _flash_attention_forward
echo "=== Checking create_flash_attn_forward_varlen_llama3 implementation ==="
ast-grep --pattern $'def create_flash_attn_forward_varlen_llama3($$$) {
$$$
}'Repository: axolotl-ai-cloud/axolotl
Length of output: 139
🏁 Script executed:
#!/bin/bash
# Search for create_flash_attn_forward_varlen_llama3 function definition using proper Python pattern
echo "=== Searching for create_flash_attn_forward_varlen_llama3 definition ==="
rg -n "def create_flash_attn_forward_varlen_llama3" --type=py -A 20
echo -e "\n=== Searching for check_params function ==="
rg -n "def check_params" --type=py -A 10
echo -e "\n=== Checking activation_checkpointing.py for GradientCheckpointingLayer import ==="
rg -n "GradientCheckpointingLayer" src/axolotl/core/trainers/mixins/activation_checkpointing.py -B 5 -A 5
echo -e "\n=== Checking if there are any try/except blocks around GradientCheckpointingLayer import in activation_checkpointing.py ==="
head -30 src/axolotl/core/trainers/mixins/activation_checkpointing.pyRepository: axolotl-ai-cloud/axolotl
Length of output: 3779
🏁 Script executed:
#!/bin/bash
# Check what check_params is - it might be builtin or from somewhere
echo "=== Checking context around line 176 of batch.py ==="
sed -n '170,180p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py
echo -e "\n=== Searching for check_params import or definition ==="
rg -n "check_params" --type=py -B 2 -A 2
echo -e "\n=== Getting full signature of returned _flash_attention_forward ==="
sed -n '61,165p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py | head -50
echo -e "\n=== Checking if GradientCheckpointingLayer is exported from transformers main module ==="
rg -n "GradientCheckpointingLayer" --type=py | grep -E "(from transformers import|^src/axolotl/.*:.*from transformers.*GradientCheckpointingLayer)"Repository: axolotl-ai-cloud/axolotl
Length of output: 4565
🏁 Script executed:
#!/bin/bash
# Critical verification: Check if GradientCheckpointingLayer is available via both import paths
echo "=== Checking if GradientCheckpointingLayer is exported from transformers main __init__ ==="
rg -n "GradientCheckpointingLayer" . --type=py | grep -v ".pyc" | grep -v test | head -20
echo -e "\n=== Checking transformers.modeling_layers path in imports ==="
rg -n "transformers.modeling_layers" --type=py
echo -e "\n=== Checking if there are any compatibility tests or guards around GradientCheckpointingLayer usage ==="
rg -n "GradientCheckpointingLayer" -A 2 -B 2 --type=py | grep -E "(try|except|ImportError|version|if)" -B 2 -A 2
echo -e "\n=== Check recent commits or issues related to GradientCheckpointingLayer in axolotl ==="
git log --oneline -20 --all -- "*GradientCheckpointingLayer*" 2>/dev/null || echo "Git log not available"
echo -e "\n=== Check if there's any conditional import or error handling in activation_checkpointing.py ==="
head -60 src/axolotl/core/trainers/mixins/activation_checkpointing.pyRepository: axolotl-ai-cloud/axolotl
Length of output: 4092
🏁 Script executed:
#!/bin/bash
# Verify if GradientCheckpointingLayer is exported from transformers main module
echo "=== Checking if transformers exports GradientCheckpointingLayer from main module ==="
python3 << 'EOF'
import sys
try:
from transformers import GradientCheckpointingLayer
print("✓ GradientCheckpointingLayer IS exported from transformers main module")
except ImportError as e:
print(f"✗ GradientCheckpointingLayer NOT exported from transformers: {e}")
try:
from transformers.modeling_layers import GradientCheckpointingLayer
print("✓ GradientCheckpointingLayer IS available from transformers.modeling_layers")
except ImportError as e:
print(f"✗ GradientCheckpointingLayer NOT available from transformers.modeling_layers: {e}")
EOF
echo -e "\n=== Checking if _flash_attention_forward exists and is accessible ==="
python3 << 'EOF'
try:
import transformers.modeling_flash_attention_utils
if hasattr(transformers.modeling_flash_attention_utils, '_flash_attention_forward'):
print("✓ _flash_attention_forward exists in transformers.modeling_flash_attention_utils")
else:
print("✗ _flash_attention_forward NOT FOUND in transformers.modeling_flash_attention_utils")
except Exception as e:
print(f"✗ Error accessing _flash_attention_forward: {e}")
EOFRepository: axolotl-ai-cloud/axolotl
Length of output: 499
Verify compatibility of monkey-patched private transformers APIs with 5.4.0 and fix unguarded GradientCheckpointingLayer import.
The codebase extensively monkey-patches and imports private/internal transformers APIs that have no stability guarantee across versions. Upgrading from 5.3.0 to 5.4.0 risks breaking these integrations. Critical concerns:
-
Unguarded GradientCheckpointingLayer import:
src/axolotl/core/trainers/mixins/activation_checkpointing.py:13imports fromtransformersmain module without version gating, whilesrc/axolotl/monkeypatch/gradient_checkpointing/__init__.py:17guards it with> 4.51.3and imports fromtransformers.modeling_layers. These differ in both path and gating—if transformers doesn't re-export from the main module, this import will fail at runtime. -
Inconsistent fallback patterns:
_flash_supports_windowhas a three-layer try/except fallback inbatch.py:19-30, but_flash_attention_forward(accessed atbatch.py:170, 177) has no fallback. If transformers 5.4.0 changed these private APIs, only the former would gracefully degrade. -
Private API registry modifications:
ALL_ATTENTION_FUNCTIONSis modified at multiple sites without version checks or fallbacks.
Please add try/except protection to the GradientCheckpointingLayer import in activation_checkpointing.py, add a fallback for _flash_attention_forward import, and confirm all monkey-patched paths still exist in transformers 5.4.0.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@requirements.txt` at line 15, Wrap the GradientCheckpointingLayer import in
activation_checkpointing.py with a try/except ImportError that mirrors the guard
in monkeypatch/gradient_checkpointing (__init__.py): first try importing from
the main transformers export, then fall back to importing from
transformers.modeling_layers, and if both fail set GradientCheckpointingLayer =
None (or raise a clear error/log) so code using GradientCheckpointingLayer
checks for None. Add a similar multi-path try/except fallback for obtaining
_flash_attention_forward in batch.py (follow the same three-layer fallback
pattern used by _flash_supports_window) so the code degrades gracefully if the
private symbol moved/was removed. Finally, guard any modifications to
ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a
version gate using transformers.__version__ parsed via packaging.version.parse)
so monkey-patches only apply when the expected private API symbols exist and the
transformers version is compatible.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Summary by CodeRabbit