-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Feat: add arcee #3028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Feat: add arcee #3028
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
451fe5f
feat: add arcee
NanoCode012 e1a221c
feat: add latest models supported by cce
NanoCode012 2759ad3
feat: add arcee example config
NanoCode012 f3c3538
chore: lint
NanoCode012 4a26eec
fix: typo
NanoCode012 fec5230
feat: change to instruct
NanoCode012 a2997d9
feat: add vram usage
NanoCode012 eb0f860
Update README.md
NanoCode012 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| # Finetune ArceeAI's AFM with Axolotl | ||
|
|
||
| [Arcee Foundation Models (AFM)](https://huggingface.co/collections/arcee-ai/afm-45b-68823397c351603014963473) are a family of 4.5B parameter open weight models trained by Arcee.ai. | ||
|
|
||
| This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. | ||
|
|
||
| Thanks to the team at Arcee.ai for using Axolotl in supervised fine-tuning the AFM model. | ||
|
|
||
| ## Getting started | ||
|
|
||
| 1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as AFM is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html). | ||
|
|
||
| Here is an example of how to install from main for pip: | ||
|
|
||
| ```bash | ||
| # Ensure you have Pytorch installed (Pytorch 2.6.0 min) | ||
| git clone https://github.com/axolotl-ai-cloud/axolotl.git | ||
| cd axolotl | ||
|
|
||
| pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja | ||
| pip3 install --no-build-isolation -e '.[flash-attn]' | ||
| ``` | ||
|
|
||
| 2. Run the finetuning example: | ||
|
|
||
| ```bash | ||
| axolotl train examples/arcee/afm-4.5b-qlora.yaml | ||
| ``` | ||
|
|
||
| This config uses about 7.8GiB VRAM. | ||
|
|
||
| Let us know how it goes. Happy finetuning! 🚀 | ||
|
|
||
| ### TIPS | ||
|
|
||
| - For inference, the official Arcee.ai team recommends `top_p: 0.95`, `temperature: 0.5`, `top_k: 50`, and `repeat_penalty: 1.1`. | ||
| - You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config. | ||
| - Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html). | ||
| - The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template). | ||
|
|
||
| ## Optimization Guides | ||
|
|
||
| - [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html) | ||
| - [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html) | ||
| - [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html) | ||
|
|
||
| ## Related Resources | ||
|
|
||
| - [AFM Blog](https://docs.arcee.ai/arcee-foundation-models/introduction-to-arcee-foundation-models) | ||
| - [Axolotl Docs](https://docs.axolotl.ai) | ||
| - [Axolotl Website](https://axolotl.ai) | ||
| - [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl) | ||
| - [Axolotl Discord](https://discord.gg/7m9sfhzaf3) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,64 @@ | ||
| base_model: arcee-ai/AFM-4.5B | ||
|
|
||
| # Automatically upload checkpoint and final model to HF | ||
| # hub_model_id: username/custom_model_name | ||
|
|
||
| plugins: | ||
| - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin | ||
|
|
||
| load_in_8bit: false | ||
| load_in_4bit: true | ||
|
|
||
| datasets: | ||
| - path: fozziethebeat/alpaca_messages_2k_test | ||
| type: chat_template | ||
|
|
||
| dataset_prepared_path: last_run_prepared | ||
| val_set_size: 0.1 | ||
| output_dir: ./outputs/lora-out | ||
|
|
||
| adapter: qlora | ||
| lora_model_dir: | ||
|
|
||
| sequence_len: 2048 | ||
| sample_packing: true | ||
|
|
||
| lora_r: 32 | ||
| lora_alpha: 16 | ||
| lora_dropout: 0.05 | ||
| lora_target_linear: true | ||
| lora_target_modules: | ||
| - gate_proj | ||
| - down_proj | ||
| - up_proj | ||
| - q_proj | ||
| - v_proj | ||
| - k_proj | ||
| - o_proj | ||
|
|
||
| wandb_project: | ||
| wandb_entity: | ||
| wandb_watch: | ||
| wandb_name: | ||
| wandb_log_model: | ||
|
|
||
| gradient_accumulation_steps: 4 | ||
| micro_batch_size: 2 | ||
| num_epochs: 1 | ||
| optimizer: adamw_bnb_8bit | ||
| lr_scheduler: cosine | ||
| learning_rate: 0.0002 | ||
|
|
||
| bf16: auto | ||
| tf32: false | ||
|
|
||
| gradient_checkpointing: true | ||
| resume_from_checkpoint: | ||
| logging_steps: 1 | ||
| flash_attention: true | ||
|
|
||
| warmup_ratio: 0.1 | ||
| evals_per_epoch: 1 | ||
| saves_per_epoch: 1 | ||
|
|
||
| # save_first_step: true # uncomment this to validate checkpoint saving works with your config | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,7 @@ | |
| "glm4", | ||
| "smollm3", | ||
| "gpt_oss", | ||
| "arcee", | ||
| ] | ||
|
|
||
|
|
||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Unset
lora_model_dirmay override CLI resume logiclora_model_dir:is present but empty. Axolotl interprets an empty string as “use the same directory asoutput_dir”, which can silently overwrite checkpoints when resuming. If that’s intentional, drop the key; otherwise set an explicit path.📝 Committable suggestion
🤖 Prompt for AI Agents