Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
97c2635
cp fix for mamba models (#3572) [skip ci]
ved1beta May 22, 2026
a4b00df
feat: update transformers to 5.8.1 (#3650) [skip ci]
NanoCode012 May 22, 2026
20f56fa
feat-qgalore (#3654) [skip ci]
ved1beta May 22, 2026
d198094
feat support with autoprocessor (#3656) [skip ci]
ved1beta May 22, 2026
bccc1e5
fix AssertionError: Original QKV code not found (#3657) [skip ci]
ved1beta May 22, 2026
65b5308
simpo (#3665) [skip ci]
ved1beta May 22, 2026
1d68aca
fix test_rm_lora rmv skip (#3669)
ved1beta May 22, 2026
a50dd98
fix: ep test missed teardown (#3674)
NanoCode012 May 26, 2026
b433cbd
fix broken MX tests from transformers 5.8.1 upgrade (#3679) [skip ci]
winglian May 26, 2026
b05ab9a
feat(fsdp2): add fp32_norms for keeping RMSNorm/LayerNorm in fp32 (#3…
winglian May 26, 2026
ab1a0d8
latest typer breaks HF CLI (#3684) [skip ci]
winglian May 26, 2026
3c4ff59
fix flaky ep tests (#3683) [skip ci]
winglian May 26, 2026
3aeb078
fix: shim Gemma4 use_kernels kernelize() crash on vision tower (#3687)
winglian May 28, 2026
d452e65
update to use same as latest it gemma4 chat template (#3686) [skip ci]
winglian May 28, 2026
3f478db
fix: refactor kernels patch to drop routing and inject into Expert (#…
NanoCode012 May 28, 2026
5c1a266
Fused ScatterMoE-LoRA for MXFP4 weights (#3663) [skip ci]
winglian May 28, 2026
9a79b68
tiled-MLP for MoE: MoE block patcher + FSDP2 reshard fix + grad-accum…
winglian May 28, 2026
135c4ee
fix modal call with explicit module flag for future deprecation (#366…
winglian May 28, 2026
91adc26
fix: respect has_aux contract in KD liger chunked loss (#3660)
roycho96 May 28, 2026
ead6bc7
scattermoe-lora: INT64_INDICES tl.constexpr in scatter2scatter family…
winglian May 28, 2026
d3592f3
feat(mm-cpt): expand dataset pipeline support
thad0ctor May 28, 2026
45849b7
docs(mm-cpt): clarify dataset and resume paths
thad0ctor May 28, 2026
a2f1193
docs(mm-cpt): note epoch-based non-streaming runs
thad0ctor May 28, 2026
f6b7ee0
docs(mm-cpt): add multimodal CPT examples
thad0ctor May 28, 2026
157b7d4
test(mm-cpt): make streaming partial patch resilient
thad0ctor May 28, 2026
ac011a9
fix(data): preserve dataset order in cache hash
thad0ctor May 29, 2026
c42adf4
style(mm-cpt): trim comments and docstrings
thad0ctor May 29, 2026
b6c6528
fix(mm-cpt): fail fast on map dataset encoding
thad0ctor May 29, 2026
280506e
feat(qwen): fused RMSNorm+RoPE for Qwen3/3.X family + Liger m-rope de…
thad0ctor May 29, 2026
5f23d93
fix: compute kd loss in trainer to bypass broken patch/inject paths (…
roycho96 May 29, 2026
bf19bff
test(scattermoe-lora): skip on CUDA OOM under xdist contention (#3689…
winglian May 29, 2026
8d15592
fix(mm-cpt): tighten resume pipeline coverage
thad0ctor May 29, 2026
6da2f9e
add pytorch 2.12 base and prune unused base images (#3697)
winglian Jun 1, 2026
3f6f8c6
bump transformers to 5.9.0 and trl to 1.5.1 (#3696)
winglian Jun 2, 2026
406aee4
prefer latest pytorch as gated e2e tests (#3698)
winglian Jun 2, 2026
09d325b
fix(ci): build pypi release via `uv build` instead of removed setup.p…
winglian Jun 3, 2026
41ef48f
fix(gemma4): key shared KV by layer_type on transformers >=5.8 (#3701)
thad0ctor Jun 4, 2026
e13bf16
test(ci): cut CPU test tail — drop dataset_num_proc to 1, split build…
winglian Jun 4, 2026
2e868f9
Merge upstream/main into feat/mm-cpt-dataset-pipeline-clean
thad0ctor Jun 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 4 additions & 20 deletions .github/workflows/base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,6 @@ jobs:
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
python_version: "3.11"
pytorch: 2.10.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
cudnn_version: ""
Expand All @@ -70,14 +62,6 @@ jobs:
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.9.1
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
Expand Down Expand Up @@ -208,19 +192,19 @@ jobs:
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "128"
cuda_version: 12.8.1
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.11.0
torch_cuda_arch_list: "7.0 7.5 8.0 8.6 8.7 8.9 9.0+PTX"
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
- cuda: "130"
cuda_version: 13.0.0
cudnn_version: ""
python_version: "3.12"
pytorch: 2.11.0
pytorch: 2.12.0
torch_cuda_arch_list: "9.0 10.0 10.3 12.0+PTX"
dockerfile: "Dockerfile-uv-base"
platforms: "linux/amd64,linux/arm64"
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/multi-gpu-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,8 @@ jobs:
# dockerfile: "Dockerfile-uv.jinja"
- cuda: 130
cuda_version: 13.0.0
python_version: "3.11"
pytorch: 2.9.1
python_version: "3.12"
pytorch: 2.12.0
axolotl_extras:
# axolotl_extras: fbgemm-gpu
num_gpus: 2
Expand Down
22 changes: 8 additions & 14 deletions .github/workflows/pypi.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ on:

permissions: {}

env:
UV_SYSTEM_PYTHON: "1"

jobs:
setup_release:
name: Create Release
Expand All @@ -24,7 +21,10 @@ jobs:
- name: Create release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: gh release create "$GITHUB_REF_NAME" --generate-notes
# idempotent: don't fail a re-run if the release already exists
run: |
gh release view "$GITHUB_REF_NAME" >/dev/null 2>&1 \
|| gh release create "$GITHUB_REF_NAME" --generate-notes
pypi-publish:
name: Upload release to PyPI
runs-on: ubuntu-latest
Expand All @@ -47,13 +47,6 @@ jobs:
- name: Install uv
uses: astral-sh/setup-uv@v7

- name: Install dependencies
run: |
uv pip install wheel packaging
uv pip install --no-build-isolation -e .
uv pip install black mypy pre-commit types-requests quartodoc jupyter blobfile tiktoken \
codecov codecov-cli pytest pytest-cov pytest-retry pytest-sugar pytest-xdist tbparse

- name: Extract tag name
id: tag
run: echo "TAG_NAME=$(echo $GITHUB_REF | cut -d / -f 3)" >> "$GITHUB_OUTPUT"
Expand All @@ -62,9 +55,10 @@ jobs:
run: |
echo "${{ steps.tag.outputs.TAG_NAME }}" | sed 's/^v//' > VERSION

- name: Build a source dist
run: |
python setup.py sdist
- name: Build sdist and wheel
# PEP 517 build via uv (setuptools backend reads the version from VERSION);
# replaces the removed `python setup.py sdist` after the pyproject migration.
run: uv build

- name: Publish package distributions to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
4 changes: 2 additions & 2 deletions .github/workflows/tests-nightly.yml
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ jobs:
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
modal run cicd.e2e_tests
modal run -m cicd.e2e_tests
docker-e2e-multigpu-tests:
if: github.repository_owner == 'axolotl-ai-cloud'
# this job needs to be run on self-hosted GPU runners...
Expand Down Expand Up @@ -203,4 +203,4 @@ jobs:
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
modal run cicd.multigpu
modal run -m cicd.multigpu
24 changes: 9 additions & 15 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,11 @@ jobs:
fail-fast: false
matrix:
python_version: ["3.12", "3.14"]
pytorch_version: ["2.9.1", "2.10.0"]
pytorch_version: ["2.9.1", "2.10.0", "2.11.0", "2.12.0"]
exclude:
- python_version: "3.14"
pytorch_version: "2.9.1"
timeout-minutes: 25
timeout-minutes: 30

steps:
- name: cleanup node
Expand Down Expand Up @@ -155,7 +155,7 @@ jobs:
fail-fast: false
matrix:
python_version: ["3.12", "3.14"]
pytorch_version: ["2.9.1", "2.10.0"]
pytorch_version: ["2.9.1", "2.10.0", "2.11.0", "2.12.0"]
exclude:
- python_version: "3.14"
pytorch_version: "2.9.1"
Expand Down Expand Up @@ -274,7 +274,7 @@ jobs:
- cuda: 130
cuda_version: 13.0.0
python_version: "3.12"
pytorch: 2.9.1
pytorch: 2.12.0
num_gpus: 1
axolotl_extras:
steps:
Expand Down Expand Up @@ -302,7 +302,7 @@ jobs:
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
modal run cicd.e2e_tests
modal run -m cicd.e2e_tests

docker-e2e-tests:
if: >
Expand All @@ -320,12 +320,6 @@ jobs:
fail-fast: false
matrix:
include:
- cuda: 128
cuda_version: 12.8.1
python_version: "3.11"
pytorch: 2.9.1
num_gpus: 1
axolotl_extras:
- cuda: 128
cuda_version: 12.8.1
python_version: "3.11"
Expand All @@ -334,8 +328,8 @@ jobs:
axolotl_extras:
- cuda: 130
cuda_version: 13.0.0
python_version: "3.11"
pytorch: 2.9.1
python_version: "3.12"
pytorch: 2.11.0
num_gpus: 1
axolotl_extras:
steps:
Expand Down Expand Up @@ -364,7 +358,7 @@ jobs:
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
run: |
modal run cicd.e2e_tests
modal run -m cicd.e2e_tests

docker-e2e-cleanup:
runs-on: [self-hosted, modal]
Expand Down Expand Up @@ -404,4 +398,4 @@ jobs:
echo "N_GPUS=${{ matrix.num_gpus }}" >> $GITHUB_ENV
- name: Run tests job on Modal
run: |
modal run cicd.cleanup
modal run -m cicd.cleanup
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
uv.lock

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.16.2.dev0
0.17.0.dev
9 changes: 9 additions & 0 deletions cicd/cicd.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,11 +43,20 @@ pytest --full-trace -vvv --durations=10 \
--cov-append

# Run solo tests with coverage append
# test_rm_lora is run in its own process below (it fails on py3.11 when sharing
# the solo process with other tests; isolating it avoids cross-test state).
pytest -v --durations=10 -n1 \
--ignore=tests/e2e/solo/test_reward_model_smollm2.py \
/workspace/axolotl/tests/e2e/solo/ \
--cov=axolotl \
--cov-append

# Run reward-model test isolated in its own process
pytest -v --durations=10 -s \
/workspace/axolotl/tests/e2e/solo/test_reward_model_smollm2.py \
--cov=axolotl \
--cov-append

# Run integration tests with coverage append
pytest -v --durations=10 \
/workspace/axolotl/tests/e2e/integrations/ \
Expand Down
20 changes: 20 additions & 0 deletions docs/mixed_precision.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,26 @@ bf16: true
bf16: full # Equivalent to bf16_full_eval in the HF trainer
```

### Keeping norms in fp32 (FSDP2) {#sec-fp32-norms}

Some models declare RMSNorm/LayerNorm layers as fp32 for training
stability — the variance computation in RMSNorm is numerically poor in
bf16, and the learned gain γ quantizes harshly. With FSDP1 this fights
the flat-param dtype uniformity constraint; with FSDP2 each norm can have
its own `MixedPrecisionPolicy`. Enable with:

```{.yaml}
fsdp_version: 2
fp32_norms: true
# fp32_norm_classes: # optional override
# - RMSNorm
# - LayerNorm
```

Defaults match any class whose name ends in `RMSNorm` or `LayerNorm`. Use
fully qualified names (`module.path.ClassName`) to pin a specific
implementation.

## FP8 Mixed Precision {#sec-fp8}

::: {.callout-note}
Expand Down
Loading