Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 27 additions & 3 deletions .github/workflows/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,14 +35,17 @@ jobs:
cudnn_version: ""
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
# arm64 disabled: torchvision 0.24.1+cu128 has no aarch64 wheel
platforms: "linux/amd64"
Comment on lines +41 to +42

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check PyPI for torchvision 0.24.1 wheel availability by platform and CUDA version

echo "=== Checking torchvision 0.24.1 wheels ==="
curl -s https://pypi.org/pypi/torchvision/0.24.1/json | \
  jq -r '.urls[] | select(.packagetype == "bdist_wheel") | .filename' | \
  grep -E "(cu128|cu130)" | sort

echo ""
echo "=== Filtering for aarch64 wheels ==="
curl -s https://pypi.org/pypi/torchvision/0.24.1/json | \
  jq -r '.urls[] | select(.packagetype == "bdist_wheel") | .filename' | \
  grep -E "(cu128|cu130)" | grep aarch64 || echo "No aarch64 wheels found for cu128/cu130"

Repository: axolotl-ai-cloud/axolotl

Length of output: 189


Clarify that cu128/cu130 wheels are unavailable entirely for torchvision 0.24.1, not just aarch64.

The verification confirms no cu128/cu130 wheels exist for torchvision 0.24.1 on PyPI. The comment is technically accurate (aarch64 wheels are indeed absent), but the wording suggests aarch64-specific unavailability when these CUDA versions aren't available for any platform. Update the comment to clarify: # torchvision 0.24.1 does not have cu128/cu130 wheels available rather than implying aarch64 is uniquely restricted.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/base.yml around lines 41 - 42, Update the inline comment
above the "platforms: \"linux/amd64\"" setting to state that the cu128/cu130
wheels for torchvision 0.24.1 are unavailable for any platform rather than only
aarch64; replace the current comment `# arm64 disabled: torchvision 0.24.1+cu128
has no aarch64 wheel` with something like `# torchvision 0.24.1 does not have
cu128/cu130 wheels available` so the reason for restricting platforms is
accurate and unambiguous.

- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.10.0
torchvision: 0.25.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -51,6 +54,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.10.0
torchvision: 0.25.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -59,6 +63,7 @@ jobs:
# cudnn_version: ""
# python_version: "3.12"
# pytorch: 2.9.1
# torchvision: 0.24.1
# torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
# dockerfile: "Dockerfile-base"
# platforms: "linux/amd64,linux/arm64"
Expand All @@ -67,22 +72,27 @@ jobs:
cudnn_version: ""
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
# arm64 disabled: torchvision 0.24.1+cu130 has no aarch64 wheel
platforms: "linux/amd64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
# arm64 disabled: torchvision 0.24.1+cu130 has no aarch64 wheel
platforms: "linux/amd64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.10.0
torchvision: 0.25.0
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -91,6 +101,7 @@ jobs:
# cudnn_version: ""
# python_version: "3.11"
# pytorch: nightly
# torchvision: nightly
# torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
# dockerfile: "Dockerfile-base-nightly"
# # "next" is for release candidates of pytorch
Expand All @@ -99,6 +110,7 @@ jobs:
# cudnn_version: ""
# python_version: "3.11"
# pytorch: next
# torchvision: next
# torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
# dockerfile: "Dockerfile-base-next"
steps:
Expand Down Expand Up @@ -133,6 +145,7 @@ jobs:
CUDA=${{ matrix.cuda }}
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch }}
TORCHVISION_VERSION=${{ matrix.torchvision }}
TORCH_CUDA_ARCH_LIST=${{ matrix.torch_cuda_arch_list }}
build-base-uv:
if: ${{ github.repository_owner == 'axolotl-ai-cloud' && (github.event_name != 'pull_request' || !github.event.pull_request.draft) }}
Expand All @@ -149,6 +162,7 @@ jobs:
cudnn_version: ""
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -157,6 +171,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -165,6 +180,7 @@ jobs:
cudnn_version: ""
python_version: "3.11"
pytorch: 2.10.0
torchvision: 0.25.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -173,6 +189,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.10.0
torchvision: 0.25.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -181,6 +198,7 @@ jobs:
# cudnn_version: ""
# python_version: "3.12"
# pytorch: 2.9.1
# torchvision: 0.24.1
# torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
# dockerfile: "Dockerfile-uv-base"
# platforms: "linux/amd64,linux/arm64"
Expand All @@ -189,6 +207,7 @@ jobs:
cudnn_version: ""
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -197,6 +216,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.9.1
torchvision: 0.24.1
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -205,6 +225,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.10.0
torchvision: 0.25.0
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -213,6 +234,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.11.0
torchvision: 0.26.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand All @@ -221,6 +243,7 @@ jobs:
cudnn_version: ""
python_version: "3.12"
pytorch: 2.11.0
torchvision: 0.26.0
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand Down Expand Up @@ -256,4 +279,5 @@ jobs:
CUDA=${{ matrix.cuda }}
PYTHON_VERSION=${{ matrix.python_version }}
PYTORCH_VERSION=${{ matrix.pytorch }}
TORCHVISION_VERSION=${{ matrix.torchvision }}
TORCH_CUDA_ARCH_LIST=${{ matrix.torch_cuda_arch_list }}
4 changes: 4 additions & 0 deletions .github/workflows/multi-gpu-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,20 +35,23 @@ jobs:
# cuda_version: 12.9.1
# python_version: "3.12"
# pytorch: 2.9.1
# torchvision: 0.24.1
# axolotl_extras: "fbgemm-gpu"
# num_gpus: 2
# dockerfile: "Dockerfile-uv.jinja"
- cuda: 130
cuda_version: 13.0.0
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
axolotl_extras:
# axolotl_extras: fbgemm-gpu
num_gpus: 2
- cuda: 128
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.10.0
torchvision: 0.25.0
axolotl_extras: "fbgemm-gpu"
num_gpus: 2
runs-on: [self-hosted, modal]
Expand All @@ -68,6 +71,7 @@ jobs:
run: |
echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
echo "TORCHVISION_VERSION=${{ matrix.torchvision}}" >> $GITHUB_ENV
echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/tests-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,19 +119,22 @@ jobs:
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
num_gpus: 1
axolotl_extras:
nightly_build: "true"
- cuda: 128
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.10.0
torchvision: 0.25.0
num_gpus: 1
axolotl_extras:
- cuda: 130
cuda_version: 13.0.0
python_version: "3.12"
pytorch: 2.9.1
torchvision: 0.24.1
num_gpus: 1
axolotl_extras:
nightly_build: "true"
Expand All @@ -150,6 +153,7 @@ jobs:
run: |
echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
echo "TORCHVISION_VERSION=${{ matrix.torchvision}}" >> $GITHUB_ENV
echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
Expand All @@ -176,6 +180,7 @@ jobs:
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
num_gpus: 2
axolotl_extras:
nightly_build: "true"
Expand All @@ -194,6 +199,7 @@ jobs:
run: |
echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
echo "TORCHVISION_VERSION=${{ matrix.torchvision}}" >> $GITHUB_ENV
echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,7 @@ jobs:
cuda_version: 13.0.0
python_version: "3.12"
pytorch: 2.9.1
torchvision: 0.24.1
num_gpus: 1
axolotl_extras:
steps:
Expand All @@ -292,6 +293,7 @@ jobs:
run: |
echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
echo "TORCHVISION_VERSION=${{ matrix.torchvision}}" >> $GITHUB_ENV
echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
Expand Down Expand Up @@ -324,18 +326,21 @@ jobs:
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
num_gpus: 1
axolotl_extras:
- cuda: 128
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.10.0
torchvision: 0.25.0
num_gpus: 1
axolotl_extras:
- cuda: 130
cuda_version: 13.0.0
python_version: "3.11"
pytorch: 2.9.1
torchvision: 0.24.1
num_gpus: 1
axolotl_extras:
steps:
Expand All @@ -353,6 +358,7 @@ jobs:
run: |
echo "BASE_TAG=main-base-py${{ matrix.python_version }}-cu${{ matrix.cuda }}-${{ matrix.pytorch }}" >> $GITHUB_ENV
echo "PYTORCH_VERSION=${{ matrix.pytorch}}" >> $GITHUB_ENV
echo "TORCHVISION_VERSION=${{ matrix.torchvision}}" >> $GITHUB_ENV
echo "AXOLOTL_ARGS=${{ matrix.axolotl_args}}" >> $GITHUB_ENV
echo "AXOLOTL_EXTRAS=${{ matrix.axolotl_extras}}" >> $GITHUB_ENV
echo "CUDA=${{ matrix.cuda }}" >> $GITHUB_ENV
Expand Down
13 changes: 8 additions & 5 deletions cicd/Dockerfile-uv.jinja
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
FROM axolotlai/axolotl-base-uv:{{ BASE_TAG }}

ENV VIRTUAL_ENV="/workspace/axolotl-venv"
ENV TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 9.0+PTX"
ENV AXOLOTL_EXTRAS="{{ AXOLOTL_EXTRAS }}"
ENV AXOLOTL_ARGS="{{ AXOLOTL_ARGS }}"
ENV CUDA="{{ CUDA }}"
ENV PYTORCH_VERSION="{{ PYTORCH_VERSION }}"
ENV TORCHVISION_VERSION="{{ TORCHVISION_VERSION }}"
ENV GITHUB_REF="{{ GITHUB_REF }}"
ENV GITHUB_SHA="{{ GITHUB_SHA }}"
ENV NIGHTLY_BUILD="{{ NIGHTLY_BUILD }}"
Expand All @@ -23,13 +25,14 @@ RUN git fetch origin +$GITHUB_REF && \
git checkout FETCH_HEAD

RUN uv pip install packaging==26.0 setuptools==78.1.1
RUN uv pip install torchvision
RUN uv pip uninstall causal_conv1d
RUN if [ "$AXOLOTL_EXTRAS" != "" ] ; then \
uv pip install --no-build-isolation -e .[deepspeed,flash-attn,ring-flash-attn,optimizers,ray,$AXOLOTL_EXTRAS] $AXOLOTL_ARGS; \
RUN uv pip freeze | grep -E "^(torch|torchvision)==" > /tmp/torch-pin.txt && \
if [ "$AXOLOTL_EXTRAS" != "" ] ; then \
uv pip install --no-build-isolation -e .[deepspeed,flash-attn,ring-flash-attn,optimizers,ray,$AXOLOTL_EXTRAS] $AXOLOTL_ARGS --override /tmp/torch-pin.txt; \
else \
uv pip install --no-build-isolation -e .[deepspeed,flash-attn,ring-flash-attn,optimizers,ray] $AXOLOTL_ARGS; \
fi
uv pip install --no-build-isolation -e .[deepspeed,flash-attn,ring-flash-attn,optimizers,ray] $AXOLOTL_ARGS --override /tmp/torch-pin.txt; \
fi && \
python -c "import torch, torchvision; torchvision.ops.nms; print('OK', torch.__version__, torchvision.__version__)"

# Override with nightly HF packages for nightly builds
RUN if [ "$NIGHTLY_BUILD" = "true" ] ; then \
Expand Down
1 change: 1 addition & 0 deletions cicd/multigpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
"AXOLOTL_EXTRAS": os.environ.get("AXOLOTL_EXTRAS", ""),
"AXOLOTL_ARGS": os.environ.get("AXOLOTL_ARGS", ""),
"PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.6.0"),
"TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.21.0"),
"BASE_TAG": os.environ.get("BASE_TAG", "main-base-py3.11-cu126-2.6.0"),
"CUDA": os.environ.get("CUDA", "126"),
"GITHUB_REF": os.environ.get("GITHUB_REF", "refs/heads/main"),
Expand Down
1 change: 1 addition & 0 deletions cicd/single_gpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
"AXOLOTL_EXTRAS": os.environ.get("AXOLOTL_EXTRAS", ""),
"AXOLOTL_ARGS": os.environ.get("AXOLOTL_ARGS", ""),
"PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.6.0"),
"TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.21.0"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
import re
from pathlib import Path

pp = Path("pyproject.toml").read_text()
torch_min = re.search(r'"torch>=([^"]+)"', pp).group(1)
tv_min = re.search(r'"torchvision>=([^"]+)"', pp).group(1)
print(f"pyproject minimums: torch>={torch_min}, torchvision>={tv_min}")

for path in ("cicd/single_gpu.py", "cicd/multigpu.py"):
    txt = Path(path).read_text()
    t = re.search(r'"PYTORCH_VERSION":\s*os\.environ\.get\("PYTORCH_VERSION",\s*"([^"]+)"\)', txt)
    v = re.search(r'"TORCHVISION_VERSION":\s*os\.environ\.get\("TORCHVISION_VERSION",\s*"([^"]+)"\)', txt)
    print(f"{path} defaults: torch={t.group(1)}, torchvision={v.group(1)}")
PY

rg -n 'TORCHVISION_VERSION' .github/workflows cicd docker || true

Repository: axolotl-ai-cloud/axolotl

Length of output: 1714


Align default PyTorch/TorchVision versions with pyproject.toml constraints across CI files.

The defaults in cicd/single_gpu.py (line 26), cicd/multigpu.py (line 27), and docker/Dockerfile-uv-base all specify torch=2.6.0 and torchvision=0.21.0, but pyproject.toml requires torch>=2.9.1 and torchvision>=0.24.1. Without environment variable overrides, this silently builds against an older, incompatible ABI. Update these defaults to 2.9.1 and 0.24.1 respectively.

Suggested changes
# cicd/single_gpu.py & cicd/multigpu.py
-    "PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.6.0"),
-    "TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.21.0"),
+    "PYTORCH_VERSION": os.environ.get("PYTORCH_VERSION", "2.9.1"),
+    "TORCHVISION_VERSION": os.environ.get("TORCHVISION_VERSION", "0.24.1"),

# docker/Dockerfile-uv-base
-ARG TORCHVISION_VERSION="0.21.0"
+ARG TORCHVISION_VERSION="0.24.1"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cicd/single_gpu.py` at line 26, Update the CI default PyTorch/TorchVision
versions to match pyproject.toml constraints: change the default environment
values used in cicd single/multi-gpu configs so that TORCH_VERSION is "2.9.1"
and TORCHVISION_VERSION is "0.24.1"; locate the variables named TORCH_VERSION
and TORCHVISION_VERSION (e.g., the assignment line with "TORCHVISION_VERSION":
os.environ.get("TORCHVISION_VERSION", "0.21.0") in single_gpu.py and the
analogous line in multigpu.py) and update their fallback strings to "2.9.1" and
"0.24.1" (also ensure Dockerfile-uv-base uses the same versions).

"BASE_TAG": os.environ.get("BASE_TAG", "main-base-py3.11-cu126-2.6.0"),
"CUDA": os.environ.get("CUDA", "126"),
"GITHUB_REF": os.environ.get("GITHUB_REF", "refs/heads/main"),
Expand Down
8 changes: 5 additions & 3 deletions docker/Dockerfile-base
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ ENV PATH="/root/miniconda3/bin:${PATH}"

ARG TARGETARCH
ARG PYTHON_VERSION="3.11"
ARG PYTORCH_VERSION="2.1.2"
ARG PYTORCH_VERSION="2.9.1"
ARG TORCHVISION_VERSION="0.24.1"
ARG CUDA="128"
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 9.0+PTX"

Expand Down Expand Up @@ -44,8 +45,9 @@ ENV PATH="/root/miniconda3/envs/py${PYTHON_VERSION}/bin:${PATH}"
WORKDIR /workspace

RUN python3 -m pip install --upgrade pip && pip3 install -U packaging==26.0 setuptools==75.8.0 wheel psutil && \
python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} torchvision --extra-index-url https://download.pytorch.org/whl/cu$CUDA && \
python3 -m pip cache purge
python3 -m pip install --no-cache-dir -U torch==${PYTORCH_VERSION}+cu${CUDA} torchvision==${TORCHVISION_VERSION}+cu${CUDA} --extra-index-url https://download.pytorch.org/whl/cu$CUDA && \
python3 -m pip cache purge && \
python3 -c "import torch, torchvision; torchvision.ops.nms; print('OK', torch.__version__, torchvision.__version__)"

RUN if [ "$CUDA" != "130" ] ; then \
CAUSAL_CONV1D_FORCE_CXX11_ABI=TRUE CAUSAL_CONV1D_FORCE_BUILD=TRUE python3 -m pip install --no-cache-dir "causal_conv1d @ git+https://github.com/Dao-AILab/causal-conv1d.git@v1.5.4"; \
Expand Down
2 changes: 2 additions & 0 deletions docker/Dockerfile-uv
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
ARG BASE_TAG=main-base
FROM axolotlai/axolotl-base-uv:$BASE_TAG

ENV VIRTUAL_ENV="/workspace/axolotl-venv"

ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
ARG AXOLOTL_EXTRAS=""
ARG AXOLOTL_ARGS=""
Expand Down
Loading
Loading