fix: pin torchvision per matrix entry to prevent ABI drift#3631
fix: pin torchvision per matrix entry to prevent ABI drift#3631ved1beta wants to merge 11 commits into
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis pull request introduces explicit torchvision version pinning across CI/CD workflows and Docker configurations. GitHub Actions workflows now define a Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/tests-nightly.yml (1)
152-163: Consider deduplicating the two env-export blocks.Non-blocking: extracting this into a shared composite action (or YAML anchor) would reduce drift risk between single-GPU and multigpu workflows.
Also applies to: 199-207
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/tests-nightly.yml around lines 152 - 163, The "Update env vars" step repeats an identical environment-export block elsewhere (also around the second env-export block referenced in the review), so extract the repeated echo lines that set BASE_TAG, PYTORCH_VERSION, TORCHVISION_VERSION, AXOLOTL_ARGS, AXOLOTL_EXTRAS, CUDA, N_GPUS, E2E_DOCKERFILE and NIGHTLY_BUILD into a single reusable unit (either a GitHub composite action or a YAML anchor/alias) and replace both occurrences with a call to that shared unit; update the step name (e.g., keep "Update env vars") to call the new composite or reference the anchor so matrix variables like matrix.python_version, matrix.cuda, matrix.pytorch, matrix.torchvision, matrix.axolotl_args, matrix.axolotl_extras, matrix.num_gpus, matrix.dockerfile and matrix.nightly_build are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docker/Dockerfile-base`:
- Line 14: The default ARG TORCHVISION_VERSION ("0.24.1") is incompatible with
the specified torch version (ARG TORCH_VERSION "2.1.2"); update the ARG
TORCHVISION_VERSION default to "0.16.2" wherever it appears (the top-level ARG
TORCHVISION_VERSION and the later occurrences around lines 48-49) so the
Dockerfile installs a torchvision version compatible with torch 2.1.2; ensure
any build ARG references or INSTALL steps that use TORCHVISION_VERSION keep the
new value.
---
Nitpick comments:
In @.github/workflows/tests-nightly.yml:
- Around line 152-163: The "Update env vars" step repeats an identical
environment-export block elsewhere (also around the second env-export block
referenced in the review), so extract the repeated echo lines that set BASE_TAG,
PYTORCH_VERSION, TORCHVISION_VERSION, AXOLOTL_ARGS, AXOLOTL_EXTRAS, CUDA,
N_GPUS, E2E_DOCKERFILE and NIGHTLY_BUILD into a single reusable unit (either a
GitHub composite action or a YAML anchor/alias) and replace both occurrences
with a call to that shared unit; update the step name (e.g., keep "Update env
vars") to call the new composite or reference the anchor so matrix variables
like matrix.python_version, matrix.cuda, matrix.pytorch, matrix.torchvision,
matrix.axolotl_args, matrix.axolotl_extras, matrix.num_gpus, matrix.dockerfile
and matrix.nightly_build are preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6f1eb114-b9c8-4b5b-8fb1-7c1818fdb576
📒 Files selected for processing (9)
.github/workflows/base.yml.github/workflows/multi-gpu-e2e.yml.github/workflows/tests-nightly.yml.github/workflows/tests.ymlcicd/Dockerfile-uv.jinjacicd/multigpu.pycicd/single_gpu.pydocker/Dockerfile-basedocker/Dockerfile-uv-base
| ARG TARGETARCH | ||
| ARG PYTHON_VERSION="3.11" | ||
| ARG PYTORCH_VERSION="2.1.2" | ||
| ARG TORCHVISION_VERSION="0.24.1" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
From the official PyTorch compatibility table, which torchvision version is compatible with torch 2.1.2 (CUDA wheels)?
💡 Result:
torchvision version 0.16.2 is compatible with torch 2.1.2 (CUDA wheels).
Citations:
- 1: https://pytorch.org/get-started/previous-versions/
- 2: https://www.github.com/pytorch/vision
- 3: https://pypi.org/project/torchvision/
- 4: https://docs.pytorch.org/get-started/previous-versions/
TORCHVISION_VERSION default is incompatible with torch 2.1.2.
Line 14 defaults to 0.24.1, but the compatible version for torch 2.1.2 (Line 13) is 0.16.2. This breaks default/local builds when TORCHVISION_VERSION is not explicitly passed, resulting in an incompatible PyTorch + torchvision installation.
Suggested fix
ARG PYTORCH_VERSION="2.1.2"
-ARG TORCHVISION_VERSION="0.24.1"
+ARG TORCHVISION_VERSION="0.16.2"
ARG CUDA="128"Also applies to: 48-49
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docker/Dockerfile-base` at line 14, The default ARG TORCHVISION_VERSION
("0.24.1") is incompatible with the specified torch version (ARG TORCH_VERSION
"2.1.2"); update the ARG TORCHVISION_VERSION default to "0.16.2" wherever it
appears (the top-level ARG TORCHVISION_VERSION and the later occurrences around
lines 48-49) so the Dockerfile installs a torchvision version compatible with
torch 2.1.2; ensure any build ARG references or INSTALL steps that use
TORCHVISION_VERSION keep the new value.
|
|
||
| RUN uv pip install packaging==26.0 setuptools==78.1.1 | ||
| RUN uv pip install torchvision | ||
| RUN uv pip install torchvision==${TORCHVISION_VERSION} |
There was a problem hiding this comment.
Didn't realize we still had this here. Not sure if needed this line
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| @@ -25,11 +27,13 @@ RUN git fetch origin +$GITHUB_REF && \ | |||
| RUN uv pip install packaging==26.0 setuptools==78.1.1 | |||
| RUN uv pip install torchvision | |||
There was a problem hiding this comment.
Check whether this needs to be pinned
|
|
||
| RUN uv pip install packaging==26.0 setuptools==78.1.1 | ||
| RUN uv pip install torchvision | ||
| RUN uv pip install torchvision==0.24.1 |
There was a problem hiding this comment.
I think this is wrong. I'm not sure whether torch vision has to be here, or whether you should make it match the ENV
f504fdd to
38927c3
Compare
|
Please recheck CI and rebase please |
# Conflicts: # cicd/Dockerfile-uv.jinja
fix the torchvision version fail
Summary by CodeRabbit