fix: docker build failing by NanoCode012 · Pull Request #3622 · axolotl-ai-cloud/axolotl

NanoCode012 · 2026-04-24T05:16:34Z

Description

Fix missing xformers aarch64 gate and bump version
Drop 2.9.0 from CI as we already got 2.9.1
Cleanup docs
Remove FA from install notes (let transformers pull in FA)

BREAKING CHANGE

Flash attention isn't installed on docker images by default. It would be pulled on runtime from HF kernel's repo for newer images from better source.

Ring flash attention isn't installed by default too. For users needing this, please install via pip install axolotl[ring-flash-attn]. This would reduce our docker image size too.

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Chores
- Bumped version to 0.16.2.dev0
- Updated minimum PyTorch requirement to 2.9.1 (2.10 in FAQ)
- Updated minimum xformers version to 0.0.33.post2
- Updated minimum axolotl[flash-attn] requirement to 0.16.1 across examples
Documentation
- Updated FAQ with PyTorch 2.10 guidance and Docker image information
- Updated all example installation instructions with new dependency requirements

coderabbitai · 2026-04-24T05:16:49Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6746525c-a762-41f4-a5e9-a3c341fcd71e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Version bump to 0.16.2.dev0 with corresponding dependency constraint updates. PyTorch minimum requirement raised from 2.6.0 to 2.9.1, Axolotl example version constraints updated to ≥0.16.1, xformers bumped to ≥0.0.33.post2, and related documentation references updated.

Changes

Cohort / File(s)	Summary
Project Version `VERSION`	Version bumped from `0.16.0.dev0` to `0.16.2.dev0`.
Core Dependencies `pyproject.toml`	PyTorch minimum raised to `≥2.9.1` (from `≥2.6.0`); xformers bumped to `≥0.0.33.post2` with expanded `aarch64` exclusion marker.
Example Installation Docs `examples/.../README.md` (LiquidAI, apertus, arcee, devstral, gemma3n, gpt-oss, granite4, hunyuan, magistral, seed-oss, smolvlm2, voxtral)	PyTorch minimum version constraints updated from 2.6.0–2.7.1 to 2.9.1; Axolotl `flash-attn` requirement bumped to `≥0.16.1` (from `≥0.12.0`).
Colab Notebook `examples/colab-notebooks/colab-axolotl-example.ipynb`	Installation cell updated to require `axolotl[flash-attn]≥0.16.1`; cell source reformatted from multi-line JSON string array into consolidated single string.
Configuration Documentation `src/axolotl/integrations/kd/README.md`, `src/axolotl/utils/schemas/config.py`	Removed explicit `torch≥2.6.0` version references from `torch_compile` configuration guidance and schema descriptions.
FAQ Documentation `docs/faq.qmd`	Updated recommended PyTorch from 2.6.0 to 2.10 with Docker image tag change to `main-py3.12-cu128-2.10.0` and clarification about Python 3.12.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

bump dev version to 0.16.0.dev0 #3472: Bumps VERSION file to the same 0.16.0.dev0, setting foundation for this version update.
deprecate torch 2.8.0 support #3550: Raises supported PyTorch versions by removing/deprecating PyTorch 2.8.0 from CI workflows, aligning with this PR's 2.9.1 floor.
feat(doc): standardize the axolotl install to a release #3040: Updates example README installation instructions with matching Axolotl version constraints and PyTorch requirements.

Suggested labels

ready to merge

Suggested reviewers

winglian
SalmanMohammadi

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title 'fix: docker build failing' is vague and does not accurately reflect the primary changes in this PR, which focus on updating dependency versions (PyTorch from 2.6.0 to 2.9.1, Axolotl from 0.12.0 to 0.16.1, xformers, etc.) across multiple files.	Revise the title to specifically reflect the main change, such as 'chore: bump minimum dependency versions (PyTorch 2.9.1, Axolotl 0.16.1, xformers)' or similar, to accurately communicate the scope of updates.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/axolotl/utils/schemas/config.py (1)
1019-1019: Consider documenting auto behavior explicitly in the field description.

Line 1019 is correct, but adding one short note about how auto resolves would make the schema self-explanatory for users reading generated docs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/axolotl/utils/schemas/config.py` at line 1019, Update the field
description that currently reads "Whether to use torch.compile and which backend
to use." to explicitly explain what the "auto" option does; e.g., add a short
sentence saying that "auto" resolves at runtime by selecting an appropriate
backend when torch.compile is available (or falling back/disabled when not
supported), so readers of the schema (the field with description "Whether to use
torch.compile and which backend to use.") immediately understand how "auto" is
handled.
docs/faq.qmd (1)
60-60: Add --ipc=host to the Docker guidance in this FAQ answer.

The updated image tag is good, but users can still hit shared-memory/DataLoader failures if they run without --ipc=host. Please add a short Docker run example (or note) including this flag here.

Based on learnings: For Axolotl Docker commands, the --ipc=host flag should be included by default to prevent shared memory failures with PyTorch DataLoaders and multiprocessing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/faq.qmd` at line 60, Update the FAQ answer that mentions the torch 2.10
Docker tag to include the recommended Docker runtime flag `--ipc=host` and a
concise docker run example; specifically, modify the paragraph referencing the
`main-py3.12-cu128-2.10.0` tag to note that users should run containers with
`--ipc=host` (e.g., docker run --gpus all --ipc=host -it --rm
<image>:main-py3.12-cu128-2.10.0) to avoid PyTorch DataLoader/shared-memory
failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/colab-notebooks/colab-axolotl-example.ipynb`:
- Line 39: The pip version constraint for axolotl[flash-attn]>=0.16.1 is being
parsed by the shell as a redirection; update the notebook cell so the package
specifier is quoted (e.g., wrap axolotl[flash-attn]>=0.16.1 in quotes) so the
>=0.16.1 constraint is passed to pip instead of the shell—modify the cell
containing the !pip install --no-build-isolation axolotl[flash-attn]>=0.16.1
command accordingly (the second install for cut-cross-entropy can remain as-is).

In `@pyproject.toml`:
- Line 82: Add an early platform guard in
src/axolotl/utils/schemas/validation.py to reject configs that enable
xformers_attention on aarch64: implement a validator (e.g.,
validate_xformers_on_aarch64) that checks platform.machine() (or
platform.uname().machine) for "aarch64" and if so raises a clear ValueError when
the config key xformers_attention is true; wire this validator into the existing
configuration validation flow (the module's main validate_config / schema
validation entrypoint) so the error is raised at validation time rather than at
runtime import.

---

Nitpick comments:
In `@docs/faq.qmd`:
- Line 60: Update the FAQ answer that mentions the torch 2.10 Docker tag to
include the recommended Docker runtime flag `--ipc=host` and a concise docker
run example; specifically, modify the paragraph referencing the
`main-py3.12-cu128-2.10.0` tag to note that users should run containers with
`--ipc=host` (e.g., docker run --gpus all --ipc=host -it --rm
<image>:main-py3.12-cu128-2.10.0) to avoid PyTorch DataLoader/shared-memory
failures.

In `@src/axolotl/utils/schemas/config.py`:
- Line 1019: Update the field description that currently reads "Whether to use
torch.compile and which backend to use." to explicitly explain what the "auto"
option does; e.g., add a short sentence saying that "auto" resolves at runtime
by selecting an appropriate backend when torch.compile is available (or falling
back/disabled when not supported), so readers of the schema (the field with
description "Whether to use torch.compile and which backend to use.")
immediately understand how "auto" is handled.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb6e5df2-dfee-4d56-bcea-948e81531761

📥 Commits

Reviewing files that changed from the base of the PR and between 901f235 and ef2bd82.

📒 Files selected for processing (18)

VERSION
docs/faq.qmd
examples/LiquidAI/README.md
examples/apertus/README.md
examples/arcee/README.md
examples/colab-notebooks/colab-axolotl-example.ipynb
examples/devstral/README.md
examples/gemma3n/README.md
examples/gpt-oss/README.md
examples/granite4/README.md
examples/hunyuan/README.md
examples/magistral/README.md
examples/seed-oss/README.md
examples/smolvlm2/README.md
examples/voxtral/README.md
pyproject.toml
src/axolotl/integrations/kd/README.md
src/axolotl/utils/schemas/config.py

coderabbitai · 2026-04-24T05:21:05Z

-    "!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n",
-    "!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
-   ]
+   "source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the notebook currently contains an unquoted requirement comparator on the pip line. python - <<'PY' import json from pathlib import Path nb = json.loads(Path("examples/colab-notebooks/colab-axolotl-example.ipynb").read_text()) for idx, cell in enumerate(nb.get("cells", [])): if cell.get("cell_type") != "code": continue src = cell.get("source", "") if isinstance(src, list): src = "".join(src) if "axolotl[flash-attn]>=0.16.1" in src: print(f"Found in code cell index {idx}:") for line in src.splitlines(): if "axolotl[flash-attn]" in line: print(line) PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 158

🏁 Script executed:

python - <<'PY' import json from pathlib import Path nb = json.loads(Path("examples/colab-notebooks/colab-axolotl-example.ipynb").read_text()) for idx, cell in enumerate(nb.get("cells", [])): if cell.get("cell_type") != "code": continue src = cell.get("source", "") if isinstance(src, list): src = "".join(src) # Look for pip install lines with unquoted version specifiers if "!pip install" in src: print(f"Cell {idx}:") for line in src.splitlines(): if "!pip install" in line: print(f" {line}") PY

Repository: axolotl-ai-cloud/axolotl

Length of output: 260

Quote the pip version constraint in the shell command

Line 39 uses ! shell execution, so >=0.16.1 will be interpreted as shell redirection (>) instead of a pip version specifier. This bypasses the intended minimum version constraint and installs an unpinned version, breaking reproducibility.

Proposed fix

- "source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\"" + "source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation \"axolotl[flash-attn]>=0.16.1\"\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""

"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation \"axolotl[flash-attn]>=0.16.1\"\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/colab-notebooks/colab-axolotl-example.ipynb` at line 39, The pip version constraint for axolotl[flash-attn]>=0.16.1 is being parsed by the shell as a redirection; update the notebook cell so the package specifier is quoted (e.g., wrap axolotl[flash-attn]>=0.16.1 in quotes) so the >=0.16.1 constraint is passed to pip instead of the shell—modify the cell containing the !pip install --no-build-isolation axolotl[flash-attn]>=0.16.1 command accordingly (the second install for cut-cross-entropy can remain as-is).

coderabbitai · 2026-04-24T05:21:06Z

    "bitsandbytes==0.49.1 ; sys_platform != 'darwin'",
    "triton>=3.4.0 ; sys_platform != 'darwin'",
-    "xformers>=0.0.23.post1 ; sys_platform != 'darwin'",
+    "xformers>=0.0.33.post2 ; sys_platform != 'darwin' and platform_machine != 'aarch64'",


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify whether xformers_attention is validated against platform/arch. # Expected: either explicit aarch64 guard exists, or this confirms it is missing. set -euo pipefail echo "== xformers_attention references ==" rg -n -C3 '\bxformers_attention\b' echo echo "== xformers import/error paths ==" rg -n -C3 'import xformers|xformers not found|llama_attn_hijack_xformers' echo echo "== Platform/arch guards related to xformers ==" rg -n -C3 'aarch64|platform_machine|sys_platform|xformers'

Repository: axolotl-ai-cloud/axolotl

Length of output: 48435

🏁 Script executed:

# Check the complete validation.py file around the check_attention_fields method cat -n src/axolotl/utils/schemas/validation.py | sed -n '180,230p'

Repository: axolotl-ai-cloud/axolotl

Length of output: 2338

🏁 Script executed:

# Search for any platform_machine or aarch64 checks in validation files rg -n "platform_machine|aarch64" src/axolotl/utils/schemas/

Repository: axolotl-ai-cloud/axolotl

Length of output: 50

🏁 Script executed:

# Check the patch_manager to see when xformers patches are applied and if there are guards cat -n src/axolotl/loaders/patch_manager.py | sed -n '735,760p'

Repository: axolotl-ai-cloud/axolotl

Length of output: 1313

🏁 Script executed:

# Look for any early validation of xformers_attention before patches rg -n "xformers_attention" src/axolotl/utils/schemas/validation.py -A 5 -B 5

Repository: axolotl-ai-cloud/axolotl

Length of output: 976

Add a guard for xformers_attention on aarch64 to fail early.

Users can still enable xformers_attention in configuration on aarch64 systems even though xformers is excluded from installation on that platform. This causes a delayed runtime import error when patches are applied. Add upfront validation in src/axolotl/utils/schemas/validation.py to reject this configuration on aarch64 systems and provide a clear error message.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 82, Add an early platform guard in src/axolotl/utils/schemas/validation.py to reject configs that enable xformers_attention on aarch64: implement a validator (e.g., validate_xformers_on_aarch64) that checks platform.machine() (or platform.uname().machine) for "aarch64" and if so raises a clear ValueError when the config key xformers_attention is true; wire this validator into the existing configuration validation flow (the module's main validate_config / schema validation entrypoint) so the error is raised at validation time rather than at runtime import.

codecov · 2026-04-24T05:32:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

NanoCode012 added 2 commits April 24, 2026 11:50

fix: uv leftover docs

5060314

fix: docker build failing

ef2bd82

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

NanoCode012 added 3 commits April 24, 2026 12:21

chore: doc

717f522

fix: remove old pytorch build

e327848

fix: stop recommend flash-attn optional, let transformers pull

cec102f

NanoCode012 added 3 commits April 24, 2026 12:35

fix: remove ring flash attention from image

9a7abed

fix: quotes [skip ci]

c2d8d48

chore: naming [skip ci]

2805fa0

NanoCode012 merged commit 17fc747 into axolotl-ai-cloud:main Apr 24, 2026
1 check passed

NanoCode012 deleted the chore/uv-cleanup-1 branch April 24, 2026 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: docker build failing#3622

fix: docker build failing#3622
NanoCode012 merged 8 commits into
axolotl-ai-cloud:mainfrom
NanoCode012:chore/uv-cleanup-1

NanoCode012 commented Apr 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

codecov Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation axolotl[flash-attn]>=0.16.1\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""
	"source": "%%capture\n# This step can take ~5-10 minutes to install dependencies\n!pip install --no-build-isolation \"axolotl[flash-attn]>=0.16.1\"\n!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@fec1a88\""

Uh oh!

Conversation

NanoCode012 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

BREAKING CHANGE

Motivation and Context

How has this been tested?

AI Usage Disclaimer

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Apr 24, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NanoCode012 commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading