Skip to content

deprecate torch 2.7.1#3339

Merged
winglian merged 2 commits into
mainfrom
deprecate-torch27x
Jan 1, 2026
Merged

deprecate torch 2.7.1#3339
winglian merged 2 commits into
mainfrom
deprecate-torch27x

Conversation

@winglian
Copy link
Copy Markdown
Collaborator

@winglian winglian commented Dec 30, 2025

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • Chores

    • Updated CI/CD workflows to use CUDA 12.8.1 and PyTorch 2.8.0–2.9.1, removing support for older CUDA 12.6.3 configurations.
    • Updated Docker image tags to reflect newer versions across cloud training integrations.
  • Documentation

    • Updated minimum PyTorch requirement from 2.7.1 to 2.8.0 in Quick Start guide.
    • Updated installation examples and Blackwell GPU guidance with latest PyTorch/CUDA versions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 30, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Multiple GitHub Actions workflow files (.github/workflows/*) are updated to remove legacy CUDA 12.6.3 and PyTorch 2.7.1 matrix entries, standardizing on CUDA 12.8.1 and PyTorch 2.8.0–2.9.1. Documentation and base image references are updated correspondingly with new Docker tag versions.

Changes

Cohort / File(s) Summary
CI Workflow Matrix Consolidation
.github/workflows/base.yml, main.yml, multi-gpu-e2e.yml, nightlies.yml, tests-nightly.yml, tests.yml
Removed legacy CUDA 126 (12.6.3) and PyTorch 2.7.1 matrix entries; updated PyTorch versions from 2.7.x→2.8.0 and 2.8.0→2.9.1 across include blocks; streamlined matrix configurations to focus on CUDA 128 (12.8.1) with newer PyTorch versions.
Documentation Version Updates
README.md, docs/docker.qmd, docs/installation.qmd
Updated PyTorch version requirement from ≥2.7.1 to ≥2.8.0 in README; refreshed Docker image tag examples (removed 2.7.x entries, added 2.8.0 and 2.9.1); updated Blackwell GPU guidance to reference PyTorch 2.9.1 with CUDA 12.8.
Base Image Configuration
src/axolotl/cli/cloud/baseten/template/train_sft.py, src/axolotl/cli/cloud/modal_.py
Updated Docker base image tag from main-py3.11-cu126-2.7.1 to main-py3.11-cu128-2.9.1 in both training template and ModalCloud configuration.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

ready to merge

Suggested reviewers

  • SalmanMohammadi
  • NanoCode012

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'deprecate torch 2.7.1' directly matches the main objective of the PR, which systematically removes PyTorch 2.7.1 from CI/CD workflows and documentation across multiple files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@winglian winglian marked this pull request as ready for review December 31, 2025 11:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/docker.qmd (1)

12-12: Update Blackwell GPU guidance to reflect PyTorch 2.9.1.

This callout still references PyTorch 2.7.1, which is being deprecated in this PR. Based on the changes in docs/installation.qmd (line 29 and 114), Blackwell GPUs should now use PyTorch 2.9.1 with CUDA 12.8.

🔎 Proposed fix
-For Blackwell GPUs, please use the tags with PyTorch 2.7.1 and CUDA 12.8.
+For Blackwell GPUs, please use the tags with PyTorch 2.9.1 and CUDA 12.8.
.github/workflows/main.yml (1)

60-60: Add missing base image configuration for pytorch 2.9.0.

The main.yml workflow requires base images for cuda 128 with pytorch 2.8.0, 2.9.0, and 2.9.1. However, base.yml only builds images for 2.8.0 and 2.9.1. The BASE_TAG reference for pytorch 2.9.0 (line 27 in main.yml) will fail because the corresponding base image main-base-py3.11-cu128-2.9.0 does not exist.

Add a matrix entry to base.yml for cuda 128 with pytorch 2.9.0, or remove the 2.9.0 configuration from main.yml.

🧹 Nitpick comments (1)
.github/workflows/tests.yml (1)

306-318: Consider adding PyTorch 2.9.0 to the e2e test matrix.

The e2e tests only cover PyTorch 2.8.0 and 2.9.1, omitting 2.9.0 that is tested in unit tests and built in the main workflow. While this may be intentional to reduce CI costs, it creates a gap where 2.9.0 Docker images are built and unit-tested but not e2e-tested.

If comprehensive coverage is desired, add a 2.9.0 entry:

🔎 Suggested addition for complete coverage
           - cuda: 128
             cuda_version: 12.8.1
             python_version: "3.11"
             pytorch: 2.8.0
             num_gpus: 1
             gpu_type: "B200"
             axolotl_extras: fbgemm-gpu
+          - cuda: 128
+            cuda_version: 12.8.1
+            python_version: "3.11"
+            pytorch: 2.9.0
+            num_gpus: 1
+            axolotl_extras:
           - cuda: 128
             cuda_version: 12.8.1
             python_version: "3.11"
             pytorch: 2.9.1
             num_gpus: 1
             axolotl_extras:
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e73dab6 and 97c93e4.

📒 Files selected for processing (11)
  • .github/workflows/base.yml
  • .github/workflows/main.yml
  • .github/workflows/multi-gpu-e2e.yml
  • .github/workflows/nightlies.yml
  • .github/workflows/tests-nightly.yml
  • .github/workflows/tests.yml
  • README.md
  • docs/docker.qmd
  • docs/installation.qmd
  • src/axolotl/cli/cloud/baseten/template/train_sft.py
  • src/axolotl/cli/cloud/modal_.py
💤 Files with no reviewable changes (1)
  • .github/workflows/base.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-02T02:56:20.788Z
Learnt from: NanoCode012
Repo: axolotl-ai-cloud/axolotl PR: 2854
File: README.md:73-77
Timestamp: 2025-07-02T02:56:20.788Z
Learning: For Axolotl Docker commands, the `--ipc=host` flag should be included by default to prevent shared memory failures that commonly occur with PyTorch DataLoaders and multiprocessing during machine learning training workflows.

Applied to files:

  • docs/installation.qmd
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: PyTest (3.11, 2.9.1)
  • GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.9.1)
  • GitHub Check: preview
  • GitHub Check: PyTest (3.11, 2.9.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
  • GitHub Check: PyTest (3.11, 2.8.0)
🔇 Additional comments (15)
README.md (1)

80-80: LGTM! PyTorch minimum version updated correctly.

The requirement update to PyTorch ≥2.8.0 aligns with the PR's objective to deprecate PyTorch 2.7.1 and standardize on newer versions.

.github/workflows/multi-gpu-e2e.yml (1)

32-45: LGTM! Multi-GPU test matrix updated to latest PyTorch versions.

The matrix now tests PyTorch 2.8.0 and 2.9.1 with CUDA 12.8.1, removing the deprecated 2.7.1 configuration. This provides good coverage of supported PyTorch versions.

docs/docker.qmd (2)

35-36: LGTM! Base image tag examples updated correctly.

The tag examples now reflect PyTorch 2.8.0 and 2.9.1 with CUDA 12.8, consistent with the deprecation of PyTorch 2.7.1.


74-79: LGTM! Main image tag examples updated correctly.

The tag examples have been updated to PyTorch 2.8.0 and 2.9.1. The version example update from 0.10.1 to 0.12.0 appears to be bringing the documentation in line with a more recent release.

docs/installation.qmd (2)

29-29: LGTM! Blackwell GPU guidance updated correctly.

The PyTorch version requirement for Blackwell GPUs has been updated to 2.9.1 with CUDA 12.8, consistent with the deprecation of PyTorch 2.7.1.


114-114: LGTM! Blackwell Docker image recommendation updated correctly.

The Docker image tags for Blackwell GPUs now reference PyTorch 2.9.1 with CUDA 12.8, maintaining consistency with the installation guidance on line 29.

.github/workflows/nightlies.yml (2)

15-24: LGTM! Nightly build matrix updated to current PyTorch versions.

The build-axolotl matrix has been updated to build images for PyTorch 2.8.0 and 2.9.1 with CUDA 12.8.1, removing the deprecated PyTorch 2.7.1 configuration.


67-76: LGTM! Cloud image matrix updated consistently.

The build-axolotl-cloud matrix mirrors the build-axolotl updates, ensuring consistent PyTorch 2.8.0 and 2.9.1 images are built for both standard and cloud variants.

src/axolotl/cli/cloud/modal_.py (1)

85-85: LGTM! Modal cloud default image updated to latest version.

The default Docker tag has been updated to use PyTorch 2.9.1 with CUDA 12.8, aligning with the deprecation of PyTorch 2.7.1. Users can still override this via config.docker_tag if needed.

src/axolotl/cli/cloud/baseten/template/train_sft.py (1)

27-27: LGTM! Baseten training template updated to latest image.

The BASE_IMAGE has been updated to use PyTorch 2.9.1 with CUDA 12.8, consistent with the Modal cloud changes and the deprecation of PyTorch 2.7.1.

.github/workflows/tests-nightly.yml (3)

29-29: LGTM! PyTest matrix expanded to cover current PyTorch versions.

The matrix now tests PyTorch 2.8.0, 2.9.0, and 2.9.1, removing the deprecated 2.7.1 and providing comprehensive coverage of the 2.8.x and 2.9.x series.


102-115: LGTM! Docker e2e test matrix updated consistently.

The matrix has been migrated to CUDA 12.8.1 with PyTorch 2.8.0 and 2.9.1, removing the deprecated CUDA 12.6.3/PyTorch 2.7.1 configuration.


151-157: LGTM! Multi-GPU e2e tests updated to latest version.

The multi-GPU test configuration now uses CUDA 12.8.1 with PyTorch 2.9.1, ensuring multi-GPU scenarios are tested with the latest supported version.

.github/workflows/main.yml (1)

137-148: Verify the intentional omission of PyTorch 2.9.0 in this job.

The build-axolotl-cloud-no-tmux job matrix only includes PyTorch 2.8.0 and 2.9.1, while build-axolotl and build-axolotl-cloud jobs include 2.8.0, 2.9.0, and 2.9.1. This creates an asymmetry where the base image for PyTorch 2.9.0 will be built, but the cloud-no-tmux variant will not be available for that version.

If this is intentional for cost/time savings, consider documenting it. Otherwise, add the 2.9.0 entry for consistency:

🔎 Suggested addition for consistency
           - cuda: 128
             cuda_version: 12.8.1
             python_version: "3.11"
             pytorch: 2.8.0
             axolotl_extras:
             is_latest:
+          - cuda: 128
+            cuda_version: 12.8.1
+            python_version: "3.11"
+            pytorch: 2.9.0
+            axolotl_extras:
+            is_latest:
           - cuda: 128
             cuda_version: 12.8.1
             python_version: "3.11"
             pytorch: 2.9.1
             axolotl_extras:
             is_latest:
.github/workflows/tests.yml (1)

58-58: LGTM!

The pytest matrix correctly includes all three new PyTorch versions (2.8.0, 2.9.0, 2.9.1), ensuring comprehensive test coverage across the supported versions.

Comment thread .github/workflows/tests.yml
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 31, 2025

📖 Documentation Preview: https://6955e242c69db45c99a375e4--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 8f714f1

@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/axolotl/cli/cloud/modal_.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@winglian winglian merged commit afe18ac into main Jan 1, 2026
25 of 27 checks passed
@winglian winglian deleted the deprecate-torch27x branch January 1, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant