upgrade transformers to 5.4.0 #3562

coderabbitai · 2026-03-31T00:49:15Z

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

transformers 5.4.0 release notes breaking changes from 5.3.0 Hugging Face

💡 Result:

Hugging Face Transformers v5.4.0 was released on 2026-03-27. The full release notes are available at https://github.com/huggingface/transformers/releases/tag/v5.4.0. Breaking changes in v5.4.0 (specific to this minor release from v5.3.0): - Refactor Image Processors to support different backends (PR #43514): The dual BaseImageProcessor/BaseImageProcessorFast design has been replaced with a unified backend architecture. The image_processing_utils_fast module has been removed. Users should migrate to the new unified ImageProcessor. - Validate config attributes (PR #41250): Added validation for PretrainedConfig attributes, which may cause type checking failures in some cases (see issue #45071). - Initial support for FA4 with breaking changes (PR #42435). - Changes to weight tying behavior. Note: Many broader v5.x breaking changes (e.g., dynamic weight loading, tokenization simplifications, removal of TF/JAX support) were introduced in v5.0.0. For migration from v4.x, refer to MIGRATION_GUIDE_V5.md. Recent issues reported post-v5.4.0 release include breaks for models using remote_code (issue #45020) and PretrainedConfig type checking (issue #45071), dated around 2026-03-28.

Citations:

1: https://newreleases.io/project/github/huggingface/transformers/release/v5.4.0

2: https://github.com/huggingface/transformers/releases

3: https://pypi.org/project/transformers/5.4.0/

4: https://pypi.org/project/transformers/

5: v5.4.0 breaks PretrainedConfig type checking huggingface/transformers#45071

6: Recent transformers versions break models using remote_code huggingface/transformers#45020

🏁 Script executed:

#!/bin/bash # Description: Verify that all private/internal transformers APIs used in monkey-patches are still accessible in the codebase echo "=== Checking imports of private transformers APIs ===" # Check _flash_attention_forward usage echo -e "\n1. Checking _flash_attention_forward usage:" rg -n "modeling_flash_attention_utils\._flash_attention_forward" --type=py -C2 # Check _flash_supports_window imports with fallbacks echo -e "\n2. Checking _flash_supports_window imports:" rg -n "_flash_supports_window" --type=py -C2 # Check ALL_ATTENTION_FUNCTIONS usage echo -e "\n3. Checking ALL_ATTENTION_FUNCTIONS registry modifications:" rg -n "ALL_ATTENTION_FUNCTIONS" --type=py -C3 # Check GradientCheckpointingLayer with version gates echo -e "\n4. Checking GradientCheckpointingLayer imports:" rg -n "GradientCheckpointingLayer" --type=py -C3 # Find all transformers version checks echo -e "\n5. Finding all transformers version boundary checks:" rg -n "transformers_version|version.*transformers" --type=py -C2

Repository: axolotl-ai-cloud/axolotl

Length of output: 21244

🏁 Script executed:

#!/bin/bash # Description: Check the signature and implementation of the function that replaces _flash_attention_forward echo "=== Checking create_flash_attn_forward_varlen_llama3 implementation ===" ast-grep --pattern $'def create_flash_attn_forward_varlen_llama3($$$) { $$$ }'

Repository: axolotl-ai-cloud/axolotl

Length of output: 139

🏁 Script executed:

#!/bin/bash # Search for create_flash_attn_forward_varlen_llama3 function definition using proper Python pattern echo "=== Searching for create_flash_attn_forward_varlen_llama3 definition ===" rg -n "def create_flash_attn_forward_varlen_llama3" --type=py -A 20 echo -e "\n=== Searching for check_params function ===" rg -n "def check_params" --type=py -A 10 echo -e "\n=== Checking activation_checkpointing.py for GradientCheckpointingLayer import ===" rg -n "GradientCheckpointingLayer" src/axolotl/core/trainers/mixins/activation_checkpointing.py -B 5 -A 5 echo -e "\n=== Checking if there are any try/except blocks around GradientCheckpointingLayer import in activation_checkpointing.py ===" head -30 src/axolotl/core/trainers/mixins/activation_checkpointing.py

Repository: axolotl-ai-cloud/axolotl

Length of output: 3779

🏁 Script executed:

#!/bin/bash # Check what check_params is - it might be builtin or from somewhere echo "=== Checking context around line 176 of batch.py ===" sed -n '170,180p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py echo -e "\n=== Searching for check_params import or definition ===" rg -n "check_params" --type=py -B 2 -A 2 echo -e "\n=== Getting full signature of returned _flash_attention_forward ===" sed -n '61,165p' src/axolotl/monkeypatch/ring_attn/adapters/batch.py | head -50 echo -e "\n=== Checking if GradientCheckpointingLayer is exported from transformers main module ===" rg -n "GradientCheckpointingLayer" --type=py | grep -E "(from transformers import|^src/axolotl/.*:.*from transformers.*GradientCheckpointingLayer)"

Repository: axolotl-ai-cloud/axolotl

Length of output: 4565

🏁 Script executed:

#!/bin/bash # Critical verification: Check if GradientCheckpointingLayer is available via both import paths echo "=== Checking if GradientCheckpointingLayer is exported from transformers main __init__ ===" rg -n "GradientCheckpointingLayer" . --type=py | grep -v ".pyc" | grep -v test | head -20 echo -e "\n=== Checking transformers.modeling_layers path in imports ===" rg -n "transformers.modeling_layers" --type=py echo -e "\n=== Checking if there are any compatibility tests or guards around GradientCheckpointingLayer usage ===" rg -n "GradientCheckpointingLayer" -A 2 -B 2 --type=py | grep -E "(try|except|ImportError|version|if)" -B 2 -A 2 echo -e "\n=== Check recent commits or issues related to GradientCheckpointingLayer in axolotl ===" git log --oneline -20 --all -- "*GradientCheckpointingLayer*" 2>/dev/null || echo "Git log not available" echo -e "\n=== Check if there's any conditional import or error handling in activation_checkpointing.py ===" head -60 src/axolotl/core/trainers/mixins/activation_checkpointing.py

Repository: axolotl-ai-cloud/axolotl

Length of output: 4092

🏁 Script executed:

#!/bin/bash # Verify if GradientCheckpointingLayer is exported from transformers main module echo "=== Checking if transformers exports GradientCheckpointingLayer from main module ===" python3 << 'EOF' import sys try: from transformers import GradientCheckpointingLayer print("✓ GradientCheckpointingLayer IS exported from transformers main module") except ImportError as e: print(f"✗ GradientCheckpointingLayer NOT exported from transformers: {e}") try: from transformers.modeling_layers import GradientCheckpointingLayer print("✓ GradientCheckpointingLayer IS available from transformers.modeling_layers") except ImportError as e: print(f"✗ GradientCheckpointingLayer NOT available from transformers.modeling_layers: {e}") EOF echo -e "\n=== Checking if _flash_attention_forward exists and is accessible ===" python3 << 'EOF' try: import transformers.modeling_flash_attention_utils if hasattr(transformers.modeling_flash_attention_utils, '_flash_attention_forward'): print("✓ _flash_attention_forward exists in transformers.modeling_flash_attention_utils") else: print("✗ _flash_attention_forward NOT FOUND in transformers.modeling_flash_attention_utils") except Exception as e: print(f"✗ Error accessing _flash_attention_forward: {e}") EOF

Repository: axolotl-ai-cloud/axolotl

Length of output: 499

Verify compatibility of monkey-patched private transformers APIs with 5.4.0 and fix unguarded GradientCheckpointingLayer import.

The codebase extensively monkey-patches and imports private/internal transformers APIs that have no stability guarantee across versions. Upgrading from 5.3.0 to 5.4.0 risks breaking these integrations. Critical concerns:

Unguarded GradientCheckpointingLayer import: src/axolotl/core/trainers/mixins/activation_checkpointing.py:13 imports from transformers main module without version gating, while src/axolotl/monkeypatch/gradient_checkpointing/__init__.py:17 guards it with > 4.51.3 and imports from transformers.modeling_layers. These differ in both path and gating—if transformers doesn't re-export from the main module, this import will fail at runtime.

Inconsistent fallback patterns: _flash_supports_window has a three-layer try/except fallback in batch.py:19-30, but _flash_attention_forward (accessed at batch.py:170, 177) has no fallback. If transformers 5.4.0 changed these private APIs, only the former would gracefully degrade.

Private API registry modifications: ALL_ATTENTION_FUNCTIONS is modified at multiple sites without version checks or fallbacks.

Please add try/except protection to the GradientCheckpointingLayer import in activation_checkpointing.py, add a fallback for _flash_attention_forward import, and confirm all monkey-patched paths still exist in transformers 5.4.0.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@requirements.txt` at line 15, Wrap the GradientCheckpointingLayer import in activation_checkpointing.py with a try/except ImportError that mirrors the guard in monkeypatch/gradient_checkpointing (__init__.py): first try importing from the main transformers export, then fall back to importing from transformers.modeling_layers, and if both fail set GradientCheckpointingLayer = None (or raise a clear error/log) so code using GradientCheckpointingLayer checks for None. Add a similar multi-path try/except fallback for obtaining _flash_attention_forward in batch.py (follow the same three-layer fallback pattern used by _flash_supports_window) so the code degrades gracefully if the private symbol moved/was removed. Finally, guard any modifications to ALL_ATTENTION_FUNCTIONS with runtime checks (hasattr membership checks and a version gate using transformers.__version__ parsed via packaging.version.parse) so monkey-patches only apply when the expected private API symbols exist and the transformers version is compatible.

-Original file line number
+Diff line change
@@ Expand Up / @@ -12,7 +12,7 @@ packaging==26.0 @@
     huggingface_hub>=1.1.7
     peft>=0.18.1
     tokenizers>=0.22.1
-    transformers==5.3.0
+    transformers==5.4.0
     accelerate==1.13.0
     datasets==4.5.0
     deepspeed>=0.18.6,<0.19.0
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -99,6 +99,9 @@ def _run_sequence_parallel_test( @@
                 "Train Loss (%s) is too high",
             )
+        @pytest.mark.skip(
+            reason="ring_flash_attn w transformers imports unmaintained upstream",
+        )
         @pytest.mark.parametrize(
             "sample_packing, micro_batch_size, pad_to_sequence_len, ring_attn_func, threshold",
             [
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -193,6 +193,7 @@ class TestAssistantDPOChatTemplatePhi3: @@
         Test class for assistant style datasets with phi-3 prompts using the tokenizer's chat_template strategy.
         """
+        @pytest.mark.xfail(reason="likely upstream issue from v5.4.0")
         def test_phi3_defaults(self, phi3_tokenizer, assistant_dataset):
             transform_fn, _ = default(
                 DictDefault(
@@ Expand Down Expand Up @@
             assert result["chosen"] == "goodbye<|eot_id|>"
             assert result["rejected"] == "party on<|eot_id|>"
+        @pytest.mark.xfail(reason="likely upstream issue from v5.4.0")
         def test_phi3_argilla_chat(self, phi3_tokenizer, argilla_chat_dataset):
             transform_fn, _ = argilla_chat(
                 DictDefault(
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

upgrade transformers to 5.4.0 #3562

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai Bot Mar 31, 2026

Uh oh!

Uh oh!

Uh oh!

upgrade transformers to 5.4.0 #3562

Uh oh!

upgrade transformers to 5.4.0 #3562

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

coderabbitai Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!