Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions examples/arcee/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Finetune ArceeAI's AFM with Axolotl

[Arcee Foundation Models (AFM)](https://huggingface.co/collections/arcee-ai/afm-45b-68823397c351603014963473) are a family of 4.5B parameter open weight models trained by Arcee.ai.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Thanks to the team at Arcee.ai for using Axolotl in supervised fine-tuning the AFM model.

## Getting started

1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as AFM is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).

Here is an example of how to install from main for pip:

```bash
# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
git clone https://github.com/axolotl-ai-cloud/axolotl.git
cd axolotl

pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation -e '.[flash-attn]'
```

2. Run the finetuning example:

```bash
axolotl train examples/arcee/afm-4.5b-qlora.yaml
```

This config uses about 7.8GiB VRAM.

Let us know how it goes. Happy finetuning! 🚀

### TIPS

- For inference, the official Arcee.ai team recommends `top_p: 0.95`, `temperature: 0.5`, `top_k: 50`, and `repeat_penalty: 1.1`.
- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).

## Optimization Guides

- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)

## Related Resources

- [AFM Blog](https://docs.arcee.ai/arcee-foundation-models/introduction-to-arcee-foundation-models)
- [Axolotl Docs](https://docs.axolotl.ai)
- [Axolotl Website](https://axolotl.ai)
- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
64 changes: 64 additions & 0 deletions examples/arcee/afm-4.5b-qlora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
base_model: arcee-ai/AFM-4.5B

# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

load_in_8bit: false
load_in_4bit: true

datasets:
- path: fozziethebeat/alpaca_messages_2k_test
type: chat_template

dataset_prepared_path: last_run_prepared
val_set_size: 0.1
output_dir: ./outputs/lora-out

adapter: qlora
lora_model_dir:

Comment on lines +21 to +22

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unset lora_model_dir may override CLI resume logic

lora_model_dir: is present but empty. Axolotl interprets an empty string as “use the same directory as output_dir”, which can silently overwrite checkpoints when resuming. If that’s intentional, drop the key; otherwise set an explicit path.

- lora_model_dir:
+# lora_model_dir: ./outputs/afm-4.5b-qlora-adapter
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lora_model_dir:
lora_model_dir: ./outputs/afm-4.5b-qlora-adapter
🤖 Prompt for AI Agents
In examples/arcee/afm-4.5b-qlora.yaml around lines 21 to 22, the key
`lora_model_dir` is present but set to an empty value, which Axolotl treats as
the same directory as `output_dir`, potentially overwriting checkpoints during
resume. To fix this, either remove the `lora_model_dir` key entirely if you want
to use the default resume behavior, or set it explicitly to a different
directory path to avoid accidental overwrites.

sequence_len: 2048
sample_packing: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

bf16: auto
tf32: false

gradient_checkpointing: true
resume_from_checkpoint:
logging_steps: 1
flash_attention: true

warmup_ratio: 0.1
evals_per_epoch: 1
saves_per_epoch: 1

# save_first_step: true # uncomment this to validate checkpoint saving works with your config
2 changes: 1 addition & 1 deletion examples/colab-notebooks/colab-axolotl-example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
"%%capture\n",
"# This step can take ~5-10 minutes to install dependencies\n",
"!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n",
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@48b5169\""
"!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@bb8d9f8\""
]
},
{
Expand Down
1 change: 0 additions & 1 deletion examples/magistral/magistral-small-fsdp-qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ sequence_len: 2048
sample_packing: true
eval_sample_packing: false


lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
Expand Down
1 change: 0 additions & 1 deletion examples/magistral/magistral-small-qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ lora_model_dir:
sequence_len: 2048
sample_packing: true


lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
Expand Down
1 change: 0 additions & 1 deletion examples/magistral/magistral-small-think-qlora.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ lora_model_dir:
sequence_len: 2048
sample_packing: true


lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
Expand Down
2 changes: 1 addition & 1 deletion scripts/cutcrossentropy_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,5 @@

print(
UNINSTALL_PREFIX
+ f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@48b5169"'
+ f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@bb8d9f8"'
)
7 changes: 6 additions & 1 deletion src/axolotl/integrations/cut_cross_entropy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ python scripts/cutcrossentropy_install.py | sh

- If you are installing from pip
```bash
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@48b5169"
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@bb8d9f8"
```

## Usage
Expand All @@ -31,6 +31,7 @@ plugins:

## Supported Models

- arcee
- cohere
- cohere2
- gemma
Expand All @@ -41,13 +42,17 @@ plugins:
- gemma3n_text
- glm
- glm4
- gpt_oss
- granite
- granitemoe
- hunyuan_v1_dense
- hunyuan_v1_moe
- llama
- llama4
- llama4_text
- mistral
- mistral3
- mixtral
- mllama
- phi
- phi3
Expand Down
2 changes: 1 addition & 1 deletion src/axolotl/integrations/cut_cross_entropy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

_CCE_INSTALL_MESSAGE = (
"Please install Axolotl's fork of cut_cross_entropy with transformers support using "
'`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@48b5169"`'
'`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@bb8d9f8"`'
)


Expand Down
1 change: 1 addition & 0 deletions src/axolotl/monkeypatch/multipack.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
"glm4",
"smollm3",
"gpt_oss",
"arcee",
]


Expand Down
Loading