fix: pin torchvision per matrix entry to prevent ABI drift by ved1beta · Pull Request #3653 · axolotl-ai-cloud/axolotl

ved1beta · 2026-05-14T11:14:58Z

fix the torchvision version fail

Summary by CodeRabbit

Chores
- Updated Docker build and CI/CD pipeline configurations to use explicit torchvision version pinning across different environments.
- Introduced torchvision version specifications for various PyTorch configurations to ensure consistent dependency resolution during builds and testing.

coderabbitai · 2026-05-14T11:15:05Z

📝 Walkthrough

Walkthrough

This PR introduces explicit torchvision version pinning across CI workflows and Docker build matrices, plus full Q-GaLore optimizer support including schema definitions, validation, parameter group building, trainer integration, and comprehensive tests.

Changes

TorchVision Version Pinning Infrastructure

Layer / File(s)	Summary
CI workflow matrix updates for torchvision `.github/workflows/base.yml`, `.github/workflows/multi-gpu-e2e.yml`, `.github/workflows/tests-nightly.yml`, `.github/workflows/tests.yml`	All CI test and build workflows add torchvision version to their build matrices per CUDA/PyTorch combination and export `TORCHVISION_VERSION` to the environment for downstream Docker builds.
Dockerfile base images with torchvision pinning `docker/Dockerfile-base`, `docker/Dockerfile-uv`, `docker/Dockerfile-uv-base`, `cicd/Dockerfile-uv.jinja`	Base Docker images now declare `TORCHVISION_VERSION` build arguments, pin both torch and torchvision during installation with CUDA-specific suffixes, add `VIRTUAL_ENV` setup, and verify `torchvision.ops.nms` post-install.
CI/CD build scripts pass torchvision template args `cicd/single_gpu.py`, `cicd/multigpu.py`	Single-GPU and multi-GPU CI build scripts populate `TORCHVISION_VERSION` from environment variables and pass it through Dockerfile Jinja template rendering.
Project dependency declaration updated `pyproject.toml`	Added explicit `torchvision>=0.24.1` dependency to match base image pinning.

Q-GaLore Optimizer Integration

Layer / File(s)	Summary
Q-GaLore schema definitions and enum `src/axolotl/utils/schemas/enums.py`, `src/axolotl/utils/schemas/training.py`	Added `q_galore_adamw8bit` to `CustomSupportedOptimizers` enum; added ten optional Q-GaLore hyperparameter fields to `HyperparametersConfig` (rank, projection gap, scale, projection type/quantization settings, cosine threshold, gamma growth, queue size).
Q-GaLore configuration validation `src/axolotl/utils/schemas/validation.py`	`OptimizationValidationMixin.check_qgalore` enforces Q-GaLore constraints: rejects adapter/deepspeed combinations, requires FSDP2 with `use_orig_params`, warns on missing mixed-precision, auto-populates `optim_target_modules` with attention/MLP linears.
Q-GaLore optimizer utility functions `src/axolotl/utils/optimizers/qgalore.py`	New module provides `patch_q_galore_for_modern_bnb()` to adapt Q-GaLore's optimizer signature to newer bitsandbytes, and `build_qgalore_param_groups()` to split trainable parameters into projected vs. plain groups based on dimensionality and target module matching.
Trainer builder Q-GaLore optimizer wiring `src/axolotl/core/builders/base.py`	`_configure_optimizer()` detects `q_galore_adamw8bit`, applies modern-bnb patching, instantiates `QGaLoreAdamW8bit`, and builds parameter groups via `build_qgalore_param_groups()` with configured hyperparameters. Also refactors `get_callbacks()` to pass `gc_steps` directly to `GCCallback`.
Q-GaLore validation unit tests `tests/utils/schemas/validation/test_qgalore.py`	Test suite validates adapter rejection, FSDP version constraints, and default field auto-population for bf16/rank/proj_bits.
Q-GaLore end-to-end training test `tests/e2e/test_optimizers.py`	End-to-end test trains with `q_galore_adamw8bit` optimizer (bf16, small ranks), verifies output artifacts, and asserts optimizer class contains `"AdamW8bit"`.
Q-GaLore optimizer documentation `docs/optimizers.qmd`	Describes Q-GaLore's INT4 projection matrix behavior, installation, configuration constraints (full fine-tuning, FSDP2 required), and includes complete YAML example with all hyperparameters.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

axolotl-ai-cloud/axolotl#3457: Both PRs extend CustomSupportedOptimizers and add optimizer-specific branches/validation inside src/axolotl/core/builders/base.py and src/axolotl/utils/schemas/validation.py, with corresponding end-to-end coverage in tests/e2e/test_optimizers.py (FlashOptim in #3457 vs Q-GaLore in the main PR).
axolotl-ai-cloud/axolotl#3017: Both PRs touch src/axolotl/core/builders/base.py in the custom optimizer configuration path—main PR adds Q-GaLore (q_galore_adamw8bit) handling while retrieved PR removes ao_adamw_4bit/8bit custom optimizer support—so the changes are related at the code-level.
axolotl-ai-cloud/axolotl#3268: Both PRs modify docker/Dockerfile-base/.github/workflows/base.yml around the PYTORCH_VERSION selection (pinning/updating to Torch 2.9.1), though the main PR also adds TORCHVISION_VERSION plumbing and torchvision install/verification.

Suggested reviewers

winglian
SalmanMohammadi

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'fix: pin torchvision per matrix entry to prevent ABI drift' directly and specifically describes the main change across the changeset—adding explicit torchvision version pinning to CI workflows and Dockerfile templates to prevent ABI compatibility issues.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

src/axolotl/utils/optimizers/qgalore.py (2)
30-30: 💤 Low value

Consider using tuple unpacking for clarity.

Static analysis suggests replacing tuple concatenation with unpacking syntax for improved readability:
♻️ Suggested refactor
-            lambda *a, **kw: bw(
-                *(a[:7] + (0.0, 0.0) + a[7:] if len(a) == 15 else a), **kw
-            )
+            lambda *a, **kw: bw(
+                *(*a[:7], 0.0, 0.0, *a[7:]) if len(a) == 15 else a, **kw
+            )
Apply similar change to line 35.
Also applies to: 35-35
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/utils/optimizers/qgalore.py` at line 30, The tuple is being built
via concatenation (a[:7] + (0.0, 0.0) + a[7:]) which is less readable; replace
that with explicit tuple unpacking so the call uses ( *a[:7], 0.0, 0.0, *a[7:] )
instead (do the same refactor at the similar occurrence on line 35) — update the
expression inside the starred argument in qgalore.py where this concatenation
appears to use unpacking for clarity.
64-64: 💤 Low value

Verify substring matching behavior for target modules.

The filter any(t in name for t in target_modules) performs substring matching, so target_modules = ["attn", "mlp"] will match parameter names like "model.layers.0.self_attn.q_proj.weight" and "model.layers.0.mlp.gate_proj.weight". This is likely intentional for flexibility, but it could also match unintended parameters (e.g., if a parameter name happens to contain "mlp" as part of a longer word).

Consider documenting this substring matching behavior in the docstring or adding an example showing what patterns are expected to match.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/axolotl/utils/optimizers/qgalore.py` at line 64, The current filter uses
substring matching via any(t in name for t in target_modules) (seen in the
p.dim() == 2 and any(...) check), which can unintentionally match parameters
containing those substrings; either document this substring-matching behavior in
the function/class docstring that builds parameter groups (mention
target_modules, name, and p.dim() condition) or tighten the check to
boundary-aware matching (e.g., match against module name tokens via
name.split('.') or use regex with word boundaries) so only intended modules like
"attn" or "mlp" match their module segments rather than any substring.
cicd/Dockerfile-uv.jinja (1)
28-28: 💤 Low value

Clarify the rationale for uninstalling causal_conv1d.

Line 28 adds uv pip uninstall causal_conv1d before the main installation. This appears to prevent conflicts during the subsequent editable install, but the reason isn't documented. Consider adding a comment explaining why this pre-uninstall is necessary (e.g., version conflicts, rebuild requirements, or ABI compatibility).
📝 Suggested comment
+# Uninstall causal_conv1d to avoid version conflicts during editable install
 RUN uv pip uninstall causal_conv1d
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cicd/Dockerfile-uv.jinja` at line 28, Add a brief explanatory comment above
the RUN uv pip uninstall causal_conv1d line in Dockerfile-uv.jinja that states
why causal_conv1d is being uninstalled (e.g., to avoid version/ABI conflicts
with the subsequent editable install, force a rebuild, or remove preinstalled
incompatible wheel), referencing the RUN uv pip uninstall causal_conv1d command
so reviewers understand the rationale and can remove it safely if requirements
change.
docker/Dockerfile-uv-base (1)
11-12: 💤 Low value

Consider updating default versions to match workflow values.

The default PYTORCH_VERSION="2.6.0" and TORCHVISION_VERSION="0.21.0" don't match the most common versions used in the workflow matrices (e.g., PyTorch 2.9.1/2.10.0 with torchvision 0.24.1/0.25.0). While these defaults are overridden by build-args in CI, aligning them with the primary workflow versions would make local builds and debugging more consistent.
📝 Suggested default version alignment
-ARG PYTORCH_VERSION="2.6.0"
-ARG TORCHVISION_VERSION="0.21.0"
+ARG PYTORCH_VERSION="2.9.1"
+ARG TORCHVISION_VERSION="0.24.1"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker/Dockerfile-uv-base` around lines 11 - 12, Update the default ARG
values in the Dockerfile-uv-base so local builds match the workflow matrix:
change ARG PYTORCH_VERSION and ARG TORCHVISION_VERSION from "2.6.0"/"0.21.0" to
the primary workflow versions (e.g., "2.10.0" for PYTORCH_VERSION and "0.25.0"
for TORCHVISION_VERSION) so local debugging uses the same defaults as CI; ensure
you only update the values for the ARGs named PYTORCH_VERSION and
TORCHVISION_VERSION.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/base.yml:
- Around line 41-42: Update the inline comment above the "platforms:
\"linux/amd64\"" setting to state that the cu128/cu130 wheels for torchvision
0.24.1 are unavailable for any platform rather than only aarch64; replace the
current comment `# arm64 disabled: torchvision 0.24.1+cu128 has no aarch64
wheel` with something like `# torchvision 0.24.1 does not have cu128/cu130
wheels available` so the reason for restricting platforms is accurate and
unambiguous.

In `@cicd/single_gpu.py`:
- Line 26: Update the CI default PyTorch/TorchVision versions to match
pyproject.toml constraints: change the default environment values used in cicd
single/multi-gpu configs so that TORCH_VERSION is "2.9.1" and
TORCHVISION_VERSION is "0.24.1"; locate the variables named TORCH_VERSION and
TORCHVISION_VERSION (e.g., the assignment line with "TORCHVISION_VERSION":
os.environ.get("TORCHVISION_VERSION", "0.21.0") in single_gpu.py and the
analogous line in multigpu.py) and update their fallback strings to "2.9.1" and
"0.24.1" (also ensure Dockerfile-uv-base uses the same versions).

In `@src/axolotl/utils/schemas/validation.py`:
- Around line 934-945: The validator for q_galore_adamw8bit currently only
rejects fsdp_config.use_orig_params when it is explicitly False; update the
check in the block that calls cls._resolve_fsdp_version and reads fsdp_config so
that it requires use_orig_params to be explicitly True (i.e., treat None/absent
the same as False) by replacing the current boolean check with a strict is not
True condition and raise the same ValueError referencing
fsdp_config.use_orig_params and q_galore_adamw8bit.

---

Nitpick comments:
In `@cicd/Dockerfile-uv.jinja`:
- Line 28: Add a brief explanatory comment above the RUN uv pip uninstall
causal_conv1d line in Dockerfile-uv.jinja that states why causal_conv1d is being
uninstalled (e.g., to avoid version/ABI conflicts with the subsequent editable
install, force a rebuild, or remove preinstalled incompatible wheel),
referencing the RUN uv pip uninstall causal_conv1d command so reviewers
understand the rationale and can remove it safely if requirements change.

In `@docker/Dockerfile-uv-base`:
- Around line 11-12: Update the default ARG values in the Dockerfile-uv-base so
local builds match the workflow matrix: change ARG PYTORCH_VERSION and ARG
TORCHVISION_VERSION from "2.6.0"/"0.21.0" to the primary workflow versions
(e.g., "2.10.0" for PYTORCH_VERSION and "0.25.0" for TORCHVISION_VERSION) so
local debugging uses the same defaults as CI; ensure you only update the values
for the ARGs named PYTORCH_VERSION and TORCHVISION_VERSION.

In `@src/axolotl/utils/optimizers/qgalore.py`:
- Line 30: The tuple is being built via concatenation (a[:7] + (0.0, 0.0) +
a[7:]) which is less readable; replace that with explicit tuple unpacking so the
call uses ( *a[:7], 0.0, 0.0, *a[7:] ) instead (do the same refactor at the
similar occurrence on line 35) — update the expression inside the starred
argument in qgalore.py where this concatenation appears to use unpacking for
clarity.
- Line 64: The current filter uses substring matching via any(t in name for t in
target_modules) (seen in the p.dim() == 2 and any(...) check), which can
unintentionally match parameters containing those substrings; either document
this substring-matching behavior in the function/class docstring that builds
parameter groups (mention target_modules, name, and p.dim() condition) or
tighten the check to boundary-aware matching (e.g., match against module name
tokens via name.split('.') or use regex with word boundaries) so only intended
modules like "attn" or "mlp" match their module segments rather than any
substring.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f208ff9-2639-42d3-8d71-1fa5b33f315d

📥 Commits

Reviewing files that changed from the base of the PR and between d7cb1c9 and 4407920.

📒 Files selected for processing (19)

.github/workflows/base.yml
.github/workflows/multi-gpu-e2e.yml
.github/workflows/tests-nightly.yml
.github/workflows/tests.yml
cicd/Dockerfile-uv.jinja
cicd/multigpu.py
cicd/single_gpu.py
docker/Dockerfile-base
docker/Dockerfile-uv
docker/Dockerfile-uv-base
docs/optimizers.qmd
pyproject.toml
src/axolotl/core/builders/base.py
src/axolotl/utils/optimizers/qgalore.py
src/axolotl/utils/schemas/enums.py
src/axolotl/utils/schemas/training.py
src/axolotl/utils/schemas/validation.py
tests/e2e/test_optimizers.py
tests/utils/schemas/validation/test_qgalore.py

coderabbitai · 2026-05-14T14:30:50Z

+            # arm64 disabled: torchvision 0.24.1+cu128 has no aarch64 wheel
+            platforms: "linux/amd64"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check PyPI for torchvision 0.24.1 wheel availability by platform and CUDA version echo "=== Checking torchvision 0.24.1 wheels ===" curl -s https://pypi.org/pypi/torchvision/0.24.1/json | \ jq -r '.urls[] | select(.packagetype == "bdist_wheel") | .filename' | \ grep -E "(cu128|cu130)" | sort echo "" echo "=== Filtering for aarch64 wheels ===" curl -s https://pypi.org/pypi/torchvision/0.24.1/json | \ jq -r '.urls[] | select(.packagetype == "bdist_wheel") | .filename' | \ grep -E "(cu128|cu130)" | grep aarch64 || echo "No aarch64 wheels found for cu128/cu130"

Repository: axolotl-ai-cloud/axolotl

Length of output: 189

Clarify that cu128/cu130 wheels are unavailable entirely for torchvision 0.24.1, not just aarch64.

The verification confirms no cu128/cu130 wheels exist for torchvision 0.24.1 on PyPI. The comment is technically accurate (aarch64 wheels are indeed absent), but the wording suggests aarch64-specific unavailability when these CUDA versions aren't available for any platform. Update the comment to clarify: # torchvision 0.24.1 does not have cu128/cu130 wheels available rather than implying aarch64 is uniquely restricted.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/base.yml around lines 41 - 42, Update the inline comment above the "platforms: \"linux/amd64\"" setting to state that the cu128/cu130 wheels for torchvision 0.24.1 are unavailable for any platform rather than only aarch64; replace the current comment `# arm64 disabled: torchvision 0.24.1+cu128 has no aarch64 wheel` with something like `# torchvision 0.24.1 does not have cu128/cu130 wheels available` so the reason for restricting platforms is accurate and unambiguous.

coderabbitai · 2026-05-14T14:30:50Z

    "AXOLOTL_EXTRAS": os.environ.get("AXOLOTL_EXTRAS", ""),
    "AXOLOTL_ARGS": os.environ.get("AXOLOTL_ARGS", ""),
    "PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.6.0"),
+    "TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.21.0"),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import re from pathlib import Path pp = Path("pyproject.toml").read_text() torch_min = re.search(r'"torch>=([^"]+)"', pp).group(1) tv_min = re.search(r'"torchvision>=([^"]+)"', pp).group(1) print(f"pyproject minimums: torch>={torch_min}, torchvision>={tv_min}") for path in ("cicd/single_gpu.py", "cicd/multigpu.py"): txt = Path(path).read_text() t = re.search(r'"PYTORCH_VERSION":\s*os\.environ\.get$"PYTORCH_VERSION",\s*"([^"]+)"$', txt) v = re.search(r'"TORCHVISION_VERSION":\s*os\.environ\.get$"TORCHVISION_VERSION",\s*"([^"]+)"$', txt) print(f"{path} defaults: torch={t.group(1)}, torchvision={v.group(1)}") PY rg -n 'TORCHVISION_VERSION' .github/workflows cicd docker || true

Repository: axolotl-ai-cloud/axolotl

Length of output: 1714

Align default PyTorch/TorchVision versions with pyproject.toml constraints across CI files.

The defaults in cicd/single_gpu.py (line 26), cicd/multigpu.py (line 27), and docker/Dockerfile-uv-base all specify torch=2.6.0 and torchvision=0.21.0, but pyproject.toml requires torch>=2.9.1 and torchvision>=0.24.1. Without environment variable overrides, this silently builds against an older, incompatible ABI. Update these defaults to 2.9.1 and 0.24.1 respectively.

Suggested changes

# cicd/single_gpu.py & cicd/multigpu.py - "PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.6.0"), - "TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.21.0"), + "PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.9.1"), + "TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.24.1"), # docker/Dockerfile-uv-base -ARG TORCHVISION_VERSION="0.21.0" +ARG TORCHVISION_VERSION="0.24.1"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cicd/single_gpu.py` at line 26, Update the CI default PyTorch/TorchVision versions to match pyproject.toml constraints: change the default environment values used in cicd single/multi-gpu configs so that TORCH_VERSION is "2.9.1" and TORCHVISION_VERSION is "0.24.1"; locate the variables named TORCH_VERSION and TORCHVISION_VERSION (e.g., the assignment line with "TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.21.0") in single_gpu.py and the analogous line in multigpu.py) and update their fallback strings to "2.9.1" and "0.24.1" (also ensure Dockerfile-uv-base uses the same versions).

coderabbitai · 2026-05-14T14:30:50Z

+        if data.get("fsdp") or data.get("fsdp_config"):
+            fsdp_version = cls._resolve_fsdp_version(data)
+            if str(fsdp_version) != "2":
+                raise ValueError(
+                    "q_galore_adamw8bit requires FSDP2. Set fsdp_version: 2."
+                )
+            fsdp_config = data.get("fsdp_config") or {}
+            if fsdp_config.get("use_orig_params") is False:
+                raise ValueError(
+                    "q_galore_adamw8bit requires fsdp_config.use_orig_params=True so "
+                    "that per-parameter projection state survives FSDP sharding."
+                )


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for FSDP2 use_orig_params default behavior rg -n -C3 'use_orig_params' --type=py -g '!test*'

Repository: axolotl-ai-cloud/axolotl

Length of output: 4102

Fix use_orig_params validation to reject unset values.

The validation at line 941 checks if fsdp_config.get("use_orig_params") is False, which only rejects explicitly set False values. When use_orig_params is omitted (defaults to None), the check passes despite the requirement that it must be explicitly True. Since the schema allows None as the default and q_galore_adamw8bit requires the projection state to survive FSDP sharding, the validator should enforce explicit enablement:

Suggested fix

if fsdp_config.get("use_orig_params") is not True: raise ValueError( "q_galore_adamw8bit requires fsdp_config.use_orig_params=True so " "that per-parameter projection state survives FSDP sharding." )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/axolotl/utils/schemas/validation.py` around lines 934 - 945, The validator for q_galore_adamw8bit currently only rejects fsdp_config.use_orig_params when it is explicitly False; update the check in the block that calls cls._resolve_fsdp_version and reads fsdp_config so that it requires use_orig_params to be explicitly True (i.e., treat None/absent the same as False) by replacing the current boolean check with a strict is not True condition and raise the same ValueError referencing fsdp_config.use_orig_params and q_galore_adamw8bit.

codecov · 2026-05-14T14:38:03Z

Codecov Report

❌ Patch coverage is 43.28358% with 38 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/axolotl/utils/optimizers/qgalore.py	0.00%	25 Missing ⚠️
src/axolotl/core/builders/base.py	11.11%	8 Missing ⚠️
src/axolotl/utils/schemas/validation.py	77.27%	5 Missing ⚠️

📢 Thoughts on this report? Let us know!

ved1beta and others added 11 commits May 13, 2026 20:16

fix: pin torchvision per matrix entry to prevent ABI drift

a03aace

toml

fe5e8ac

build fail version update

3a4ca3b

dont smoke test

bf030c1

add venv

e74abaf

==${TORCHVISION_VERSION}

9bde764

pin to 24.1

256e374

rmv torch

38927c3

redo

1f95d49

undo

f720ced

q ga lora main

ecca61c

Your Name added 2 commits May 14, 2026 16:45

bnb patch

3dd6eee

minimal bnb patch qa lora

4407920

ved1beta marked this pull request as ready for review May 14, 2026 14:25

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

ved1beta closed this May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: pin torchvision per matrix entry to prevent ABI drift#3653

fix: pin torchvision per matrix entry to prevent ABI drift#3653
ved1beta wants to merge 13 commits into
axolotl-ai-cloud:mainfrom
ved1beta:feat-qgalore

ved1beta commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

codecov Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# arm64 disabled: torchvision 0.24.1+cu128 has no aarch64 wheel
		platforms: "linux/amd64"

Uh oh!

Conversation

ved1beta commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 14, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ved1beta commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading