Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
0a2e0e3
fix: lock version in gemma3n docs
NanoCode012 Jul 22, 2025
91235a4
feat: add sample configs and docs
NanoCode012 Jul 22, 2025
6ef8777
chore: move mistraltokenizer into mistral folder
NanoCode012 Jul 22, 2025
ae2db89
feat: update instructions
NanoCode012 Jul 22, 2025
27105f5
feat: add dynamic load voxtral
NanoCode012 Jul 22, 2025
bdd75ff
fix: remove incorrect vision config, add audio
NanoCode012 Jul 23, 2025
51eb504
fix: support voxtral processing strategy and address none in data
NanoCode012 Jul 23, 2025
406b64d
feat: patch mistraltokenizer subclass upstream and add missing
NanoCode012 Jul 23, 2025
a0c445c
feat: update cce commit to include voxtral
NanoCode012 Jul 23, 2025
f535ade
fix: remove old comment
NanoCode012 Jul 24, 2025
f6bbe7e
fix: gemma3 patch not needed anymore
NanoCode012 Jul 24, 2025
c0d1d5a
fix: voxtral modeling code
NanoCode012 Jul 24, 2025
9e7b308
fix: remove incorrect ds path
NanoCode012 Jul 24, 2025
ce134c2
fix: adjust apply chat template parsing
NanoCode012 Jul 24, 2025
f2f1198
feat: enable voxtral patch
NanoCode012 Jul 24, 2025
495077d
fix: patch
NanoCode012 Jul 24, 2025
fd100ba
feat: update example datasets
NanoCode012 Jul 24, 2025
ccd974c
fix: target layer
NanoCode012 Jul 24, 2025
db3f527
feat: update gemma3n docs
NanoCode012 Jul 24, 2025
dbbdbc0
feat: update voxtral docs
NanoCode012 Jul 24, 2025
94b9dcc
feat: revert assistant parsing to rely on new upstream changes
NanoCode012 Jul 24, 2025
7cfbdbc
chore: skip test till next PR fix
NanoCode012 Jul 24, 2025
123e07c
fix: override upstream decode due to missing handling
NanoCode012 Jul 24, 2025
6b1c4f0
feat: update readme
NanoCode012 Jul 24, 2025
309ef74
fix: update
NanoCode012 Jul 24, 2025
b6fab8e
feat: add magistral small think support
NanoCode012 Jul 24, 2025
3f2dbed
feat: update mistral-common dep
NanoCode012 Jul 24, 2025
1dcce68
fix: lint
NanoCode012 Jul 24, 2025
cd19495
fix: remove optional dep
NanoCode012 Jul 24, 2025
2ff1ae7
chore: typing
NanoCode012 Jul 24, 2025
f4af91f
chore: simply import
NanoCode012 Jul 24, 2025
e561301
feat(doc): update differences for 2507
NanoCode012 Jul 24, 2025
8f2a5ac
fix: coderrabbit comments
NanoCode012 Jul 24, 2025
6a2bd24
feat: update clarify docs on new transformers
NanoCode012 Jul 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

## 🎉 Latest Updates

- 2025/07: Voxtral with mistral-common tokenizer support has been integrated in Axolotl. Read the [docs](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/voxtral)!
- 2025/07: TiledMLP support for single-GPU to multi-GPU training with DDP, DeepSpeed and FSDP support has been added to support Arctic Long Sequence Training. (ALST). See [examples](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/alst) for using ALST with Axolotl!
- 2025/06: Magistral with mistral-common tokenizer support has been added to Axolotl. See [examples](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral) to start training your own Magistral models with Axolotl!
- 2025/05: Quantization Aware Training (QAT) support has been added to Axolotl. Explore the [docs](https://docs.axolotl.ai/docs/qat.html) to learn more!
Expand Down
2 changes: 1 addition & 1 deletion examples/colab-notebooks/colab-axolotl-example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"%%capture\n",
"# This step can take ~5-10 minutes to install dependencies\n",
"!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n",
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@631d646\""
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@010c3ac3f1e725098961832830303eeb4142dd88\""
]
Comment thread
winglian marked this conversation as resolved.
},
{
Expand Down
64 changes: 55 additions & 9 deletions examples/gemma3n/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,65 @@
# Gemma-3n
# Finetune Gemma-3n with Axolotl

## Requirements
Gemma-3n is a family of multimodal models from Google found on [HuggingFace](https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4). This guide shows how to fine-tune it with Axolotl.

In addition to Axolotl's requirements, Gemma-3n requires
## Getting started

```
pip3 install timm
1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as Gemma3n is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).

Here is an example of how to install from main for pip:

```bash
# Ensure you have Pytorch installed (Pytorch 2.6.0 min recommended)
git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl

pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation -e '.[flash-attn]'
```

If you will load audio datasets, please also install
2. In addition to Axolotl's requirements, Gemma-3n requires:

```bash
pip3 install timm==1.0.17

# for loading audio data
pip3 install librosa==0.11.0
```
pip3 install librosa

3. Run the finetuning example:

```bash
# text only
axolotl train examples/gemma3n/gemma-3n-e2b-qlora.yml

# text + vision
axolotl train examples/gemma3n/gemma-3n-e2b-vision-qlora.yml

# text + vision + audio
axolotl train examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml
```

## Usage
Let us know how it goes. Happy finetuning! 🚀

WARNING: The loss and grad norm will be much higher than normal. We suspect this to be inherent to the model as of the moment. If anyone would like to submit a fix for this, we are happy to take a look.

### TIPS

- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
- The text dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
- The multimodal dataset format follows the OpenAI multi-content Messages format as seen [here](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).

## Optimization Guides

- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)

## Related Resources

See example configs and the [multimodal doc](https://docs.axolotl.ai/docs/multimodal.html).
- [Gemma 3n Blog](https://ai.google.dev/gemma/docs/gemma-3n)
- [Axolotl Docs](https://docs.axolotl.ai)
- [Axolotl Website](https://axolotl.ai)
- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
2 changes: 0 additions & 2 deletions examples/gemma3n/gemma-3n-e2b-vision-audio-qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ eot_tokens:
datasets:
- path: Nanobit/text-vision-audio-2k-test
type: chat_template
data_files:
- dataset.jsonl
dataset_prepared_path:
val_set_size: 0.01
output_dir: ./outputs/out
Expand Down
31 changes: 28 additions & 3 deletions examples/magistral/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Finetune Magistral Small with Axolotl

Magistral Small is a 24B parameter opensource model from MistralAI found on [HuggingFace](https://huggingface.co/mistralai/Magistral-Small-2506). This guide shows how to fine-tune it with Axolotl with multi-turn conversations with proper masking.
Magistral Small is a 24B parameter opensource model from MistralAI found on HuggingFace at [2506](https://huggingface.co/mistralai/Magistral-Small-2506) and [2507](https://huggingface.co/mistralai/Magistral-Small-2507) (see [Thinking](#thinking)). This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

MistralAI has also released a proprietary medium-sized version called Magistral Medium.

Expand All @@ -13,7 +13,7 @@ Thanks to the team at MistralAI for giving us early access to prepare for this r
Here is an example of how to install from main for pip:

```bash
# Ensure you have Pytorch installed (Pytorch 2.6.0 recommended)
# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl

Expand All @@ -31,12 +31,37 @@ This config uses about 24GB VRAM.

Let us know how it goes. Happy finetuning! 🚀

### Thinking

MistralAI has released their [2507](https://huggingface.co/mistralai/Magistral-Small-2507) model with thinking capabilities. The model requires the multi-content dataset format with support for an extra `role: thinking` within system and assistant messages.

Example format:

```json
{
"messages": [
{"role": "system", "content": [{ "type": "text", "text": "{SYSTEM_PROMPT}"}]},
{"role": "user", "content": [{ "type": "text", "text": "..."}]},
{"role": "assistant", "content": [{ "type": "thinking", "thinking": "..."}, { "type": "text", "text": "..." }]},
],
}
```

Example config: `./magistral-small-think-qlora.yaml`.

The `thinking` section also supports an optional arg `closed: bool` (`True` default) which controls adding the closing `[/THINK]` tag.

Limitations:
- You cannot mix `content: str` with `content: list[dict]` as the `dataset.load_dataset` may complain about different types for `content` key.
- This mode does not work with custom `train_detail` and `training` at the moment.

### TIPS

- We recommend adding the same/similar SystemPrompt that the model is tuned for. You can find this within the repo's files titled `SYSTEM_PROMPT.txt`.
- For inference, the official MistralAI team recommends `top_p: 0.95` and `temperature: 0.7` with `max_tokens: 40960`.
- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
- The text dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).

## Optimization Guides

Expand Down
3 changes: 3 additions & 0 deletions examples/magistral/magistral-small-fsdp-qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ tokenizer_use_mistral_common: true
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

load_in_8bit: false
load_in_4bit: true

Expand Down
3 changes: 3 additions & 0 deletions examples/magistral/magistral-small-qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ tokenizer_use_mistral_common: true
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

load_in_8bit: false
load_in_4bit: true

Expand Down
68 changes: 68 additions & 0 deletions examples/magistral/magistral-small-think-qlora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
base_model: mistralai/Magistral-Small-2507

# Enable to use mistral-common tokenizer
tokenizer_use_mistral_common: true

# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

load_in_8bit: false
load_in_4bit: true

datasets:
- path: Nanobit/text-think-2k-test
type: chat_template

dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./outputs/lora-out

adapter: qlora
lora_model_dir:

sequence_len: 2048
sample_packing: true


lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: false

gradient_checkpointing: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 1
saves_per_epoch: 1

# save_first_step: true # uncomment this to validate checkpoint saving works with your config
76 changes: 76 additions & 0 deletions examples/voxtral/README.md

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, I wonder if these example-level READMEs should be propagated to our docs somewhere.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should have a "Model Guides" section which pulls the README for any existing models

Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Finetune Voxtral with Axolotl

Voxtral is a [3B](https://huggingface.co/mistralai/Voxtral-Mini-3B-2507)/[24B](https://huggingface.co/mistralai/Voxtral-Small-24B-2507) parameter opensource model from MistralAI found on HuggingFace. This guide shows how to fine-tune it with Axolotl.

Thanks to the team at MistralAI for giving us early access to prepare for this release.

## Getting started

1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as Voxtral is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).

Here is an example of how to install from main for pip:

```bash
# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl

pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation -e '.[flash-attn]'
```

2. Please install the below.

```bash
# audio
pip3 install librosa==0.11.0
pip3 install 'mistral_common[audio]==1.8.3'
```

3. Run the finetuning example:

```bash
# text only
axolotl train examples/voxtral/voxtral-mini-qlora.yml

# text + audio
axolotl train examples/voxtral/voxtral-mini-audio-qlora.yml
```

These configs use about 4.8 GB VRAM.

Let us know how it goes. Happy finetuning! 🚀

### TIPS

- For inference, the official MistralAI team recommends `temperature: 0.2` and `top_p: 0.95` for audio understanding and `temperature: 0.0` for transcription.
- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
- The text dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
- The multimodal dataset format follows the OpenAI multi-content Messages format as seen [here](https://docs.axolotl.ai/docs/multimodal.html#dataset-format).


## Optimization Guides

- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)

## Limitations

We only support the `mistral-common` tokenizer for Supervised Fine-tuning at the moment and for `type: chat_template` only.

In addition, we do not support overriding tokens yet.

## Related Resources

- [MistralAI Magistral Blog](https://mistral.ai/news/magistral/)
- [Axolotl Docs](https://docs.axolotl.ai)
- [Axolotl Website](https://axolotl.ai)
- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)

## Future Work

- Add parity to Preference Tuning, RL, etc.
- Add parity to other tokenizer configs like overriding tokens.
Loading
Loading