Skip to content

fix: pin torchvision per matrix entry to prevent ABI drift#3631

Open
ved1beta wants to merge 11 commits into
axolotl-ai-cloud:mainfrom
ved1beta:fix/torchvision-pin
Open

fix: pin torchvision per matrix entry to prevent ABI drift#3631
ved1beta wants to merge 11 commits into
axolotl-ai-cloud:mainfrom
ved1beta:fix/torchvision-pin

Conversation

@ved1beta
Copy link
Copy Markdown
Member

@ved1beta ved1beta commented Apr 29, 2026

fix the torchvision version fail

Summary by CodeRabbit

  • Chores
    • Updated Docker build and CI/CD pipeline configurations to use explicit torchvision version pinning across different environments.
    • Introduced torchvision version specifications for various PyTorch configurations to ensure consistent dependency resolution during builds and testing.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94c47663-47be-4fb7-aeb7-52f47166c4c6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces explicit torchvision version pinning across CI/CD workflows and Docker configurations. GitHub Actions workflows now define a matrix.torchvision dimension with specific versions (0.24.1, 0.25.0) tied to PyTorch configurations, propagate TORCHVISION_VERSION as an environment variable through workflow steps, and pass it as Docker build arguments. Corresponding Dockerfile configurations accept and use this version to pin the installed torchvision package instead of relying on unpinned dependency resolution.

Changes

Cohort / File(s) Summary
CI Workflow Matrix Configuration
.github/workflows/base.yml, .github/workflows/multi-gpu-e2e.yml, .github/workflows/tests-nightly.yml, .github/workflows/tests.yml
Add matrix.torchvision dimension to job matrices with explicit versions (0.24.1 for PyTorch 2.9.1; 0.25.0 for PyTorch 2.10.0) and propagate TORCHVISION_VERSION through "Update env vars" steps.
Docker Base Images
docker/Dockerfile-base, docker/Dockerfile-uv-base
Introduce TORCHVISION_VERSION build argument (defaulting to "0.24.1" and "0.21.0" respectively) and pin torchvision package installation to this version.
CI Docker Template
cicd/Dockerfile-uv.jinja
Add configurable TORCHVISION_VERSION environment variable and update pip install to use version-pinned torchvision==${TORCHVISION_VERSION}.
Build Scripts
cicd/multigpu.py, cicd/single_gpu.py
Extend Docker build argument maps to include TORCHVISION_VERSION (sourced from environment, defaulting to "0.21.0") for template rendering.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Possibly related PRs

  • #2764: Modifies the same base.yml workflow and Docker base/uv Dockerfiles for torchvision version management and build configuration.
  • #3268: Updates PyTorch to 2.9.1 and modifies docker/Dockerfile-base alongside the same CI/workflow changes.
  • #3550: Modifies CI workflow matrices and docker/Dockerfile-uv-base with related dependency version adjustments.

Suggested labels

ready to merge

Suggested reviewers

  • djsaunde
  • NanoCode012
  • SalmanMohammadi
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: pinning torchvision versions across matrix configurations to prevent ABI incompatibilities.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/tests-nightly.yml (1)

152-163: Consider deduplicating the two env-export blocks.

Non-blocking: extracting this into a shared composite action (or YAML anchor) would reduce drift risk between single-GPU and multigpu workflows.

Also applies to: 199-207

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/tests-nightly.yml around lines 152 - 163, The "Update env
vars" step repeats an identical environment-export block elsewhere (also around
the second env-export block referenced in the review), so extract the repeated
echo lines that set BASE_TAG, PYTORCH_VERSION, TORCHVISION_VERSION,
AXOLOTL_ARGS, AXOLOTL_EXTRAS, CUDA, N_GPUS, E2E_DOCKERFILE and NIGHTLY_BUILD
into a single reusable unit (either a GitHub composite action or a YAML
anchor/alias) and replace both occurrences with a call to that shared unit;
update the step name (e.g., keep "Update env vars") to call the new composite or
reference the anchor so matrix variables like matrix.python_version,
matrix.cuda, matrix.pytorch, matrix.torchvision, matrix.axolotl_args,
matrix.axolotl_extras, matrix.num_gpus, matrix.dockerfile and
matrix.nightly_build are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docker/Dockerfile-base`:
- Line 14: The default ARG TORCHVISION_VERSION ("0.24.1") is incompatible with
the specified torch version (ARG TORCH_VERSION "2.1.2"); update the ARG
TORCHVISION_VERSION default to "0.16.2" wherever it appears (the top-level ARG
TORCHVISION_VERSION and the later occurrences around lines 48-49) so the
Dockerfile installs a torchvision version compatible with torch 2.1.2; ensure
any build ARG references or INSTALL steps that use TORCHVISION_VERSION keep the
new value.

---

Nitpick comments:
In @.github/workflows/tests-nightly.yml:
- Around line 152-163: The "Update env vars" step repeats an identical
environment-export block elsewhere (also around the second env-export block
referenced in the review), so extract the repeated echo lines that set BASE_TAG,
PYTORCH_VERSION, TORCHVISION_VERSION, AXOLOTL_ARGS, AXOLOTL_EXTRAS, CUDA,
N_GPUS, E2E_DOCKERFILE and NIGHTLY_BUILD into a single reusable unit (either a
GitHub composite action or a YAML anchor/alias) and replace both occurrences
with a call to that shared unit; update the step name (e.g., keep "Update env
vars") to call the new composite or reference the anchor so matrix variables
like matrix.python_version, matrix.cuda, matrix.pytorch, matrix.torchvision,
matrix.axolotl_args, matrix.axolotl_extras, matrix.num_gpus, matrix.dockerfile
and matrix.nightly_build are preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f1eb114-b9c8-4b5b-8fb1-7c1818fdb576

📥 Commits

Reviewing files that changed from the base of the PR and between ac77da9 and 0a30b0c.

📒 Files selected for processing (9)
  • .github/workflows/base.yml
  • .github/workflows/multi-gpu-e2e.yml
  • .github/workflows/tests-nightly.yml
  • .github/workflows/tests.yml
  • cicd/Dockerfile-uv.jinja
  • cicd/multigpu.py
  • cicd/single_gpu.py
  • docker/Dockerfile-base
  • docker/Dockerfile-uv-base

Comment thread docker/Dockerfile-base
ARG TARGETARCH
ARG PYTHON_VERSION="3.11"
ARG PYTORCH_VERSION="2.1.2"
ARG TORCHVISION_VERSION="0.24.1"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

From the official PyTorch compatibility table, which torchvision version is compatible with torch 2.1.2 (CUDA wheels)?

💡 Result:

torchvision version 0.16.2 is compatible with torch 2.1.2 (CUDA wheels).

Citations:


TORCHVISION_VERSION default is incompatible with torch 2.1.2.

Line 14 defaults to 0.24.1, but the compatible version for torch 2.1.2 (Line 13) is 0.16.2. This breaks default/local builds when TORCHVISION_VERSION is not explicitly passed, resulting in an incompatible PyTorch + torchvision installation.

Suggested fix
 ARG PYTORCH_VERSION="2.1.2"
-ARG TORCHVISION_VERSION="0.24.1"
+ARG TORCHVISION_VERSION="0.16.2"
 ARG CUDA="128"

Also applies to: 48-49

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docker/Dockerfile-base` at line 14, The default ARG TORCHVISION_VERSION
("0.24.1") is incompatible with the specified torch version (ARG TORCH_VERSION
"2.1.2"); update the ARG TORCHVISION_VERSION default to "0.16.2" wherever it
appears (the top-level ARG TORCHVISION_VERSION and the later occurrences around
lines 48-49) so the Dockerfile installs a torchvision version compatible with
torch 2.1.2; ensure any build ARG references or INSTALL steps that use
TORCHVISION_VERSION keep the new value.

Comment thread cicd/Dockerfile-uv.jinja Outdated

RUN uv pip install packaging==26.0 setuptools==78.1.1
RUN uv pip install torchvision
RUN uv pip install torchvision==${TORCHVISION_VERSION}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize we still had this here. Not sure if needed this line

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Comment thread cicd/Dockerfile-uv.jinja Outdated
@@ -25,11 +27,13 @@ RUN git fetch origin +$GITHUB_REF && \
RUN uv pip install packaging==26.0 setuptools==78.1.1
RUN uv pip install torchvision
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check whether this needs to be pinned

@ved1beta ved1beta requested a review from NanoCode012 May 12, 2026 16:06
Comment thread cicd/Dockerfile-uv.jinja Outdated

RUN uv pip install packaging==26.0 setuptools==78.1.1
RUN uv pip install torchvision
RUN uv pip install torchvision==0.24.1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong. I'm not sure whether torch vision has to be here, or whether you should make it match the ENV

@ved1beta ved1beta force-pushed the fix/torchvision-pin branch from f504fdd to 38927c3 Compare May 13, 2026 14:50
Your Name added 2 commits May 14, 2026 15:40
@NanoCode012
Copy link
Copy Markdown
Collaborator

Please recheck CI and rebase please

# Conflicts:
#	cicd/Dockerfile-uv.jinja
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants