Skip to content

fix: docker build failing#3622

Merged
NanoCode012 merged 8 commits into
axolotl-ai-cloud:mainfrom
NanoCode012:chore/uv-cleanup-1
Apr 24, 2026
Merged

fix: docker build failing#3622
NanoCode012 merged 8 commits into
axolotl-ai-cloud:mainfrom
NanoCode012:chore/uv-cleanup-1

Conversation

@NanoCode012
Copy link
Copy Markdown
Collaborator

@NanoCode012 NanoCode012 commented Apr 24, 2026

Description

  • Fix missing xformers aarch64 gate and bump version
  • Drop 2.9.0 from CI as we already got 2.9.1
  • Cleanup docs
  • Remove FA from install notes (let transformers pull in FA)

BREAKING CHANGE

Flash attention isn't installed on docker images by default. It would be pulled on runtime from HF kernel's repo for newer images from better source.

Ring flash attention isn't installed by default too. For users needing this, please install via pip install axolotl[ring-flash-attn]. This would reduce our docker image size too.

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • Chores

    • Bumped version to 0.16.2.dev0
    • Updated minimum PyTorch requirement to 2.9.1 (2.10 in FAQ)
    • Updated minimum xformers version to 0.0.33.post2
    • Updated minimum axolotl[flash-attn] requirement to 0.16.1 across examples
  • Documentation

    • Updated FAQ with PyTorch 2.10 guidance and Docker image information
    • Updated all example installation instructions with new dependency requirements

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6746525c-a762-41f4-a5e9-a3c341fcd71e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Version bump to 0.16.2.dev0 with corresponding dependency constraint updates. PyTorch minimum requirement raised from 2.6.0 to 2.9.1, Axolotl example version constraints updated to ≥0.16.1, xformers bumped to ≥0.0.33.post2, and related documentation references updated.

Changes

Cohort / File(s) Summary
Project Version
VERSION
Version bumped from 0.16.0.dev0 to 0.16.2.dev0.
Core Dependencies
pyproject.toml
PyTorch minimum raised to ≥2.9.1 (from ≥2.6.0); xformers bumped to ≥0.0.33.post2 with expanded aarch64 exclusion marker.
Example Installation Docs
examples/.../README.md (LiquidAI, apertus, arcee, devstral, gemma3n, gpt-oss, granite4, hunyuan, magistral, seed-oss, smolvlm2, voxtral)
PyTorch minimum version constraints updated from 2.6.0–2.7.1 to 2.9.1; Axolotl flash-attn requirement bumped to ≥0.16.1 (from ≥0.12.0).
Colab Notebook
examples/colab-notebooks/colab-axolotl-example.ipynb
Installation cell updated to require axolotl[flash-attn]≥0.16.1; cell source reformatted from multi-line JSON string array into consolidated single string.
Configuration Documentation
src/axolotl/integrations/kd/README.md, src/axolotl/utils/schemas/config.py
Removed explicit torch≥2.6.0 version references from torch_compile configuration guidance and schema descriptions.
FAQ Documentation
docs/faq.qmd
Updated recommended PyTorch from 2.6.0 to 2.10 with Docker image tag change to main-py3.12-cu128-2.10.0 and clarification about Python 3.12.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • winglian
  • SalmanMohammadi
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'fix: docker build failing' is vague and does not accurately reflect the primary changes in this PR, which focus on updating dependency versions (PyTorch from 2.6.0 to 2.9.1, Axolotl from 0.12.0 to 0.16.1, xformers, etc.) across multiple files. Revise the title to specifically reflect the main change, such as 'chore: bump minimum dependency versions (PyTorch 2.9.1, Axolotl 0.16.1, xformers)' or similar, to accurately communicate the scope of updates.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/axolotl/utils/schemas/config.py (1)

1019-1019: Consider documenting auto behavior explicitly in the field description.

Line 1019 is correct, but adding one short note about how auto resolves would make the schema self-explanatory for users reading generated docs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/axolotl/utils/schemas/config.py` at line 1019, Update the field
description that currently reads "Whether to use torch.compile and which backend
to use." to explicitly explain what the "auto" option does; e.g., add a short
sentence saying that "auto" resolves at runtime by selecting an appropriate
backend when torch.compile is available (or falling back/disabled when not
supported), so readers of the schema (the field with description "Whether to use
torch.compile and which backend to use.") immediately understand how "auto" is
handled.
docs/faq.qmd (1)

60-60: Add --ipc=host to the Docker guidance in this FAQ answer.

The updated image tag is good, but users can still hit shared-memory/DataLoader failures if they run without --ipc=host. Please add a short Docker run example (or note) including this flag here.

Based on learnings: For Axolotl Docker commands, the --ipc=host flag should be included by default to prevent shared memory failures with PyTorch DataLoaders and multiprocessing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/faq.qmd` at line 60, Update the FAQ answer that mentions the torch 2.10
Docker tag to include the recommended Docker runtime flag `--ipc=host` and a
concise docker run example; specifically, modify the paragraph referencing the
`main-py3.12-cu128-2.10.0` tag to note that users should run containers with
`--ipc=host` (e.g., docker run --gpus all --ipc=host -it --rm
<image>:main-py3.12-cu128-2.10.0) to avoid PyTorch DataLoader/shared-memory
failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/colab-notebooks/colab-axolotl-example.ipynb`:
- Line 39: The pip version constraint for axolotl[flash-attn]>=0.16.1 is being
parsed by the shell as a redirection; update the notebook cell so the package
specifier is quoted (e.g., wrap axolotl[flash-attn]>=0.16.1 in quotes) so the
>=0.16.1 constraint is passed to pip instead of the shell—modify the cell
containing the !pip install --no-build-isolation axolotl[flash-attn]>=0.16.1
command accordingly (the second install for cut-cross-entropy can remain as-is).

In `@pyproject.toml`:
- Line 82: Add an early platform guard in
src/axolotl/utils/schemas/validation.py to reject configs that enable
xformers_attention on aarch64: implement a validator (e.g.,
validate_xformers_on_aarch64) that checks platform.machine() (or
platform.uname().machine) for "aarch64" and if so raises a clear ValueError when
the config key xformers_attention is true; wire this validator into the existing
configuration validation flow (the module's main validate_config / schema
validation entrypoint) so the error is raised at validation time rather than at
runtime import.

---

Nitpick comments:
In `@docs/faq.qmd`:
- Line 60: Update the FAQ answer that mentions the torch 2.10 Docker tag to
include the recommended Docker runtime flag `--ipc=host` and a concise docker
run example; specifically, modify the paragraph referencing the
`main-py3.12-cu128-2.10.0` tag to note that users should run containers with
`--ipc=host` (e.g., docker run --gpus all --ipc=host -it --rm
<image>:main-py3.12-cu128-2.10.0) to avoid PyTorch DataLoader/shared-memory
failures.

In `@src/axolotl/utils/schemas/config.py`:
- Line 1019: Update the field description that currently reads "Whether to use
torch.compile and which backend to use." to explicitly explain what the "auto"
option does; e.g., add a short sentence saying that "auto" resolves at runtime
by selecting an appropriate backend when torch.compile is available (or falling
back/disabled when not supported), so readers of the schema (the field with
description "Whether to use torch.compile and which backend to use.")
immediately understand how "auto" is handled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb6e5df2-dfee-4d56-bcea-948e81531761

📥 Commits

Reviewing files that changed from the base of the PR and between 901f235 and ef2bd82.

📒 Files selected for processing (18)
  • VERSION
  • docs/faq.qmd
  • examples/LiquidAI/README.md
  • examples/apertus/README.md
  • examples/arcee/README.md
  • examples/colab-notebooks/colab-axolotl-example.ipynb
  • examples/devstral/README.md
  • examples/gemma3n/README.md
  • examples/gpt-oss/README.md
  • examples/granite4/README.md
  • examples/hunyuan/README.md
  • examples/magistral/README.md
  • examples/seed-oss/README.md
  • examples/smolvlm2/README.md
  • examples/voxtral/README.md
  • pyproject.toml
  • src/axolotl/integrations/kd/README.md
  • src/axolotl/utils/schemas/config.py

"!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n",
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
]
"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the notebook currently contains an unquoted requirement comparator on the pip line.
python - <<'PY'
import json
from pathlib import Path

nb = json.loads(Path("examples/colab-notebooks/colab-axolotl-example.ipynb").read_text())
for idx, cell in enumerate(nb.get("cells", [])):
    if cell.get("cell_type") != "code":
        continue
    src = cell.get("source", "")
    if isinstance(src, list):
        src = "".join(src)
    if "axolotl[flash-attn]>=0.16.1" in src:
        print(f"Found in code cell index {idx}:")
        for line in src.splitlines():
            if "axolotl[flash-attn]" in line:
                print(line)
PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 158


🏁 Script executed:

python - <<'PY'
import json
from pathlib import Path

nb = json.loads(Path("examples/colab-notebooks/colab-axolotl-example.ipynb").read_text())
for idx, cell in enumerate(nb.get("cells", [])):
    if cell.get("cell_type") != "code":
        continue
    src = cell.get("source", "")
    if isinstance(src, list):
        src = "".join(src)
    # Look for pip install lines with unquoted version specifiers
    if "!pip install" in src:
        print(f"Cell {idx}:")
        for line in src.splitlines():
            if "!pip install" in line:
                print(f"  {line}")
PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 260


Quote the pip version constraint in the shell command

Line 39 uses ! shell execution, so >=0.16.1 will be interpreted as shell redirection (>) instead of a pip version specifier. This bypasses the intended minimum version constraint and installs an unpinned version, breaking reproducibility.

Proposed fix
-   "source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
+   "source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation \"axolotl[flash-attn]>=0.16.1\"\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation \"axolotl[flash-attn]>=0.16.1\"\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/colab-notebooks/colab-axolotl-example.ipynb` at line 39, The pip
version constraint for axolotl[flash-attn]>=0.16.1 is being parsed by the shell
as a redirection; update the notebook cell so the package specifier is quoted
(e.g., wrap axolotl[flash-attn]>=0.16.1 in quotes) so the >=0.16.1 constraint is
passed to pip instead of the shell—modify the cell containing the !pip install
--no-build-isolation axolotl[flash-attn]>=0.16.1 command accordingly (the second
install for cut-cross-entropy can remain as-is).

Comment thread pyproject.toml
"bitsandbytes==0.49.1 ; sys_platform != 'darwin'",
"triton>=3.4.0 ; sys_platform != 'darwin'",
"xformers>=0.0.23.post1 ; sys_platform != 'darwin'",
"xformers>=0.0.33.post2 ; sys_platform != 'darwin' and platform_machine != 'aarch64'",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether xformers_attention is validated against platform/arch.
# Expected: either explicit aarch64 guard exists, or this confirms it is missing.

set -euo pipefail

echo "== xformers_attention references =="
rg -n -C3 '\bxformers_attention\b'

echo
echo "== xformers import/error paths =="
rg -n -C3 'import xformers|xformers not found|llama_attn_hijack_xformers'

echo
echo "== Platform/arch guards related to xformers =="
rg -n -C3 'aarch64|platform_machine|sys_platform|xformers'

Repository: axolotl-ai-cloud/axolotl

Length of output: 48435


🏁 Script executed:

# Check the complete validation.py file around the check_attention_fields method
cat -n src/axolotl/utils/schemas/validation.py | sed -n '180,230p'

Repository: axolotl-ai-cloud/axolotl

Length of output: 2338


🏁 Script executed:

# Search for any platform_machine or aarch64 checks in validation files
rg -n "platform_machine|aarch64" src/axolotl/utils/schemas/

Repository: axolotl-ai-cloud/axolotl

Length of output: 50


🏁 Script executed:

# Check the patch_manager to see when xformers patches are applied and if there are guards
cat -n src/axolotl/loaders/patch_manager.py | sed -n '735,760p'

Repository: axolotl-ai-cloud/axolotl

Length of output: 1313


🏁 Script executed:

# Look for any early validation of xformers_attention before patches
rg -n "xformers_attention" src/axolotl/utils/schemas/validation.py -A 5 -B 5

Repository: axolotl-ai-cloud/axolotl

Length of output: 976


Add a guard for xformers_attention on aarch64 to fail early.

Users can still enable xformers_attention in configuration on aarch64 systems even though xformers is excluded from installation on that platform. This causes a delayed runtime import error when patches are applied. Add upfront validation in src/axolotl/utils/schemas/validation.py to reject this configuration on aarch64 systems and provide a clear error message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 82, Add an early platform guard in
src/axolotl/utils/schemas/validation.py to reject configs that enable
xformers_attention on aarch64: implement a validator (e.g.,
validate_xformers_on_aarch64) that checks platform.machine() (or
platform.uname().machine) for "aarch64" and if so raises a clear ValueError when
the config key xformers_attention is true; wire this validator into the existing
configuration validation flow (the module's main validate_config / schema
validation entrypoint) so the error is raised at validation time rather than at
runtime import.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@NanoCode012 NanoCode012 merged commit 17fc747 into axolotl-ai-cloud:main Apr 24, 2026
1 check passed
@NanoCode012 NanoCode012 deleted the chore/uv-cleanup-1 branch April 24, 2026 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant