deprecate torch 2.7.1#3339
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughMultiple GitHub Actions workflow files (.github/workflows/*) are updated to remove legacy CUDA 12.6.3 and PyTorch 2.7.1 matrix entries, standardizing on CUDA 12.8.1 and PyTorch 2.8.0–2.9.1. Documentation and base image references are updated correspondingly with new Docker tag versions. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
78ffef5 to
97c93e4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/docker.qmd (1)
12-12: Update Blackwell GPU guidance to reflect PyTorch 2.9.1.This callout still references PyTorch 2.7.1, which is being deprecated in this PR. Based on the changes in
docs/installation.qmd(line 29 and 114), Blackwell GPUs should now use PyTorch 2.9.1 with CUDA 12.8.🔎 Proposed fix
-For Blackwell GPUs, please use the tags with PyTorch 2.7.1 and CUDA 12.8. +For Blackwell GPUs, please use the tags with PyTorch 2.9.1 and CUDA 12.8..github/workflows/main.yml (1)
60-60: Add missing base image configuration for pytorch 2.9.0.The main.yml workflow requires base images for cuda 128 with pytorch 2.8.0, 2.9.0, and 2.9.1. However, base.yml only builds images for 2.8.0 and 2.9.1. The BASE_TAG reference for pytorch 2.9.0 (line 27 in main.yml) will fail because the corresponding base image
main-base-py3.11-cu128-2.9.0does not exist.Add a matrix entry to base.yml for cuda 128 with pytorch 2.9.0, or remove the 2.9.0 configuration from main.yml.
🧹 Nitpick comments (1)
.github/workflows/tests.yml (1)
306-318: Consider adding PyTorch 2.9.0 to the e2e test matrix.The e2e tests only cover PyTorch 2.8.0 and 2.9.1, omitting 2.9.0 that is tested in unit tests and built in the main workflow. While this may be intentional to reduce CI costs, it creates a gap where 2.9.0 Docker images are built and unit-tested but not e2e-tested.
If comprehensive coverage is desired, add a 2.9.0 entry:
🔎 Suggested addition for complete coverage
- cuda: 128 cuda_version: 12.8.1 python_version: "3.11" pytorch: 2.8.0 num_gpus: 1 gpu_type: "B200" axolotl_extras: fbgemm-gpu + - cuda: 128 + cuda_version: 12.8.1 + python_version: "3.11" + pytorch: 2.9.0 + num_gpus: 1 + axolotl_extras: - cuda: 128 cuda_version: 12.8.1 python_version: "3.11" pytorch: 2.9.1 num_gpus: 1 axolotl_extras:
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.github/workflows/base.yml.github/workflows/main.yml.github/workflows/multi-gpu-e2e.yml.github/workflows/nightlies.yml.github/workflows/tests-nightly.yml.github/workflows/tests.ymlREADME.mddocs/docker.qmddocs/installation.qmdsrc/axolotl/cli/cloud/baseten/template/train_sft.pysrc/axolotl/cli/cloud/modal_.py
💤 Files with no reviewable changes (1)
- .github/workflows/base.yml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-02T02:56:20.788Z
Learnt from: NanoCode012
Repo: axolotl-ai-cloud/axolotl PR: 2854
File: README.md:73-77
Timestamp: 2025-07-02T02:56:20.788Z
Learning: For Axolotl Docker commands, the `--ipc=host` flag should be included by default to prevent shared memory failures that commonly occur with PyTorch DataLoaders and multiprocessing during machine learning training workflows.
Applied to files:
docs/installation.qmd
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: PyTest (3.11, 2.9.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.9.1)
- GitHub Check: preview
- GitHub Check: PyTest (3.11, 2.9.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.9.0)
- GitHub Check: PyTest (3.11, 2.8.0)
🔇 Additional comments (15)
README.md (1)
80-80: LGTM! PyTorch minimum version updated correctly.The requirement update to PyTorch ≥2.8.0 aligns with the PR's objective to deprecate PyTorch 2.7.1 and standardize on newer versions.
.github/workflows/multi-gpu-e2e.yml (1)
32-45: LGTM! Multi-GPU test matrix updated to latest PyTorch versions.The matrix now tests PyTorch 2.8.0 and 2.9.1 with CUDA 12.8.1, removing the deprecated 2.7.1 configuration. This provides good coverage of supported PyTorch versions.
docs/docker.qmd (2)
35-36: LGTM! Base image tag examples updated correctly.The tag examples now reflect PyTorch 2.8.0 and 2.9.1 with CUDA 12.8, consistent with the deprecation of PyTorch 2.7.1.
74-79: LGTM! Main image tag examples updated correctly.The tag examples have been updated to PyTorch 2.8.0 and 2.9.1. The version example update from 0.10.1 to 0.12.0 appears to be bringing the documentation in line with a more recent release.
docs/installation.qmd (2)
29-29: LGTM! Blackwell GPU guidance updated correctly.The PyTorch version requirement for Blackwell GPUs has been updated to 2.9.1 with CUDA 12.8, consistent with the deprecation of PyTorch 2.7.1.
114-114: LGTM! Blackwell Docker image recommendation updated correctly.The Docker image tags for Blackwell GPUs now reference PyTorch 2.9.1 with CUDA 12.8, maintaining consistency with the installation guidance on line 29.
.github/workflows/nightlies.yml (2)
15-24: LGTM! Nightly build matrix updated to current PyTorch versions.The build-axolotl matrix has been updated to build images for PyTorch 2.8.0 and 2.9.1 with CUDA 12.8.1, removing the deprecated PyTorch 2.7.1 configuration.
67-76: LGTM! Cloud image matrix updated consistently.The build-axolotl-cloud matrix mirrors the build-axolotl updates, ensuring consistent PyTorch 2.8.0 and 2.9.1 images are built for both standard and cloud variants.
src/axolotl/cli/cloud/modal_.py (1)
85-85: LGTM! Modal cloud default image updated to latest version.The default Docker tag has been updated to use PyTorch 2.9.1 with CUDA 12.8, aligning with the deprecation of PyTorch 2.7.1. Users can still override this via
config.docker_tagif needed.src/axolotl/cli/cloud/baseten/template/train_sft.py (1)
27-27: LGTM! Baseten training template updated to latest image.The BASE_IMAGE has been updated to use PyTorch 2.9.1 with CUDA 12.8, consistent with the Modal cloud changes and the deprecation of PyTorch 2.7.1.
.github/workflows/tests-nightly.yml (3)
29-29: LGTM! PyTest matrix expanded to cover current PyTorch versions.The matrix now tests PyTorch 2.8.0, 2.9.0, and 2.9.1, removing the deprecated 2.7.1 and providing comprehensive coverage of the 2.8.x and 2.9.x series.
102-115: LGTM! Docker e2e test matrix updated consistently.The matrix has been migrated to CUDA 12.8.1 with PyTorch 2.8.0 and 2.9.1, removing the deprecated CUDA 12.6.3/PyTorch 2.7.1 configuration.
151-157: LGTM! Multi-GPU e2e tests updated to latest version.The multi-GPU test configuration now uses CUDA 12.8.1 with PyTorch 2.9.1, ensuring multi-GPU scenarios are tested with the latest supported version.
.github/workflows/main.yml (1)
137-148: Verify the intentional omission of PyTorch 2.9.0 in this job.The
build-axolotl-cloud-no-tmuxjob matrix only includes PyTorch 2.8.0 and 2.9.1, whilebuild-axolotlandbuild-axolotl-cloudjobs include 2.8.0, 2.9.0, and 2.9.1. This creates an asymmetry where the base image for PyTorch 2.9.0 will be built, but the cloud-no-tmux variant will not be available for that version.If this is intentional for cost/time savings, consider documenting it. Otherwise, add the 2.9.0 entry for consistency:
🔎 Suggested addition for consistency
- cuda: 128 cuda_version: 12.8.1 python_version: "3.11" pytorch: 2.8.0 axolotl_extras: is_latest: + - cuda: 128 + cuda_version: 12.8.1 + python_version: "3.11" + pytorch: 2.9.0 + axolotl_extras: + is_latest: - cuda: 128 cuda_version: 12.8.1 python_version: "3.11" pytorch: 2.9.1 axolotl_extras: is_latest:.github/workflows/tests.yml (1)
58-58: LGTM!The pytest matrix correctly includes all three new PyTorch versions (2.8.0, 2.9.0, 2.9.1), ensuring comprehensive test coverage across the supported versions.
|
📖 Documentation Preview: https://6955e242c69db45c99a375e4--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit 8f714f1 |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Description
Motivation and Context
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.