Feat: add arcee#3028
Conversation
📝 WalkthroughWalkthroughThis update introduces a new README and YAML configuration for fine-tuning ArceeAI's AFM-4.5B model, updates the cut-cross-entropy package commit hash in multiple locations, adds "arcee" to supported model types, and makes minor formatting corrections to Magistral YAML files. The cut-cross-entropy documentation now includes additional supported models. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
src/axolotl/monkeypatch/multipack.py (1)
13-40: Maintain alphabetical ordering ofSUPPORTED_MULTIPACK_MODEL_TYPESto ease future diffs
"arcee"was appended at the end of the list, breaking the alphabetical (and grouped-by-family) ordering that makes the list easy to scan and keeps merge conflicts small.
Consider re-inserting the new entry in its sorted position (after"deepseek_v3"and before"falcon").@@ - "deepseek_v3", - "glm", + "arcee", + "deepseek_v3", + "glm", @@ - "smollm3", - "arcee", + "smollm3",src/axolotl/integrations/cut_cross_entropy/README.md (2)
22-23: Minor: escape the commit hash with back-ticks for consistencyAll other code-style snippets use back-ticks around the command – the second
pip3line would read cleaner with them.-pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@bb8d9f8" +pip3 uninstall -y cut-cross-entropy && \ +pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@bb8d9f8"
34-56: Consider alphabetising the Supported Models listThe growing list is becoming hard to scan. Alphabetical ordering (or grouping by vendor) would make future additions trivial and avoid accidental duplicates.
Not blocking – informational only.
examples/arcee/README.md (2)
30-30: Fill the VRAM placeholder
This config uses about (---) VRAM.still contains a placeholder. Replace with an approximate value (e.g., “≈ 24 GB” on A100-80GB) so users can gauge hardware needs.
11-11: Minor phrasing tweak for clarity“You need to install from main as AFM is only on nightly or use our latest Docker images” is hard to parse. Consider:
“Install Axolotl from the
mainbranch (nightly) or pull our latest Docker image, because AFM support hasn’t landed in a stable release yet.”examples/arcee/afm-4.5b-qlora.yaml (1)
52-53:bf16: auto&tf32: false– document hardware expectations
bf16: autosilently falls back to fp16 if the GPU lacks BF16; disabling TF32 may hurt throughput on Ampere/Hopper. Add a comment so users know why TF32 is disabled and whether BF16 fallback is acceptable.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
examples/arcee/README.md(1 hunks)examples/arcee/afm-4.5b-qlora.yaml(1 hunks)examples/colab-notebooks/colab-axolotl-example.ipynb(1 hunks)examples/magistral/magistral-small-fsdp-qlora.yaml(0 hunks)examples/magistral/magistral-small-qlora.yaml(0 hunks)examples/magistral/magistral-small-think-qlora.yaml(0 hunks)scripts/cutcrossentropy_install.py(1 hunks)src/axolotl/integrations/cut_cross_entropy/README.md(3 hunks)src/axolotl/integrations/cut_cross_entropy/__init__.py(1 hunks)src/axolotl/monkeypatch/multipack.py(1 hunks)
💤 Files with no reviewable changes (3)
- examples/magistral/magistral-small-fsdp-qlora.yaml
- examples/magistral/magistral-small-think-qlora.yaml
- examples/magistral/magistral-small-qlora.yaml
🧰 Additional context used
🪛 LanguageTool
examples/arcee/README.md
[style] ~31-~31: Consider using polite language here.
Context: ...`` This config uses about (---) VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: preview
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: pre-commit
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: pre-commit
🔇 Additional comments (6)
src/axolotl/monkeypatch/multipack.py (1)
55-57: Double-check that “arcee” needs no special multipack patch
patch_for_multipack()currently triggers extra logic only for"mixtral"when DeepSpeed-ZeRO-3 is enabled.
If the Arcee model family requires bespoke forwarding /_get_unpad_datahandling similar to Mixtral or Qwen-MOE, you may need to add a dedicated branch here (or incut_cross_entropy) to avoid silent performance penalties or shape mismatches.Please confirm that the stock causal-LM forward path suffices; otherwise, add an explicit patch or TODO.
src/axolotl/integrations/cut_cross_entropy/__init__.py (1)
35-38: Verify commit bb8d9f8 exists and retains Axolotl patchesHard-pinning to a specific commit helps with reproducibility, but CI cannot find
bb8d9f8in theaxolotl-ai-cloud/ml-cross-entropyrepo. Please:
- Confirm that commit
bb8d9f8has been pushed to GitHub underaxolotl-ai-cloud/ml-cross-entropy.- Once available, verify that it still defines:
AXOLOTL_CCE_FORK = True- Exports
cut_cross_entropy.transformers.patch.cce_patchWithout the correct hash and these patches in place, the installation instructions will break.
scripts/cutcrossentropy_install.py (1)
30-33: Hash consistency verified – no stale references foundRan
rg --fixed-strings 48b5169and confirmed there are no remaining occurrences of the old hash. The new hashbb8d9f8is used consistently inscripts/cutcrossentropy_install.py, the README, and__init__.py. No further action required.examples/colab-notebooks/colab-axolotl-example.ipynb (1)
42-44: Pinned SHA bumped—validate downstream impactThe
cut-cross-entropyinstall now targets commitbb8d9f8. Ensure:
- That SHA still exposes the
transformersextra (install will break otherwise).- Any breaking API/CLI changes between
48b5169→bb8d9f8are reflected in later notebook cells and docs.If you have already run the notebook end-to-end after this bump, all good.
examples/arcee/README.md (1)
16-16: Unrealistic PyTorch version requirement
# Ensure you have Pytorch installed (Pytorch 2.6.0 min)references a release that does not exist upstream (latest stable is 2.2.x).
If unreleased nightlies are genuinely required, state that explicitly and link to the correct install command; otherwise adjust the version floor.examples/arcee/afm-4.5b-qlora.yaml (1)
17-18:val_set_sizedatatype
val_set_size: 0.1is parsed as a float (10 %) in recent Axolotl versions, but older releases expected an integer sample count. Verify you’re on a commit that supports float fractions; otherwise training will crash on schema validation.
| lora_model_dir: | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Unset lora_model_dir may override CLI resume logic
lora_model_dir: is present but empty. Axolotl interprets an empty string as “use the same directory as output_dir”, which can silently overwrite checkpoints when resuming. If that’s intentional, drop the key; otherwise set an explicit path.
- lora_model_dir:
+# lora_model_dir: ./outputs/afm-4.5b-qlora-adapter📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| lora_model_dir: | |
| lora_model_dir: ./outputs/afm-4.5b-qlora-adapter |
🤖 Prompt for AI Agents
In examples/arcee/afm-4.5b-qlora.yaml around lines 21 to 22, the key
`lora_model_dir` is present but set to an empty value, which Axolotl treats as
the same directory as `output_dir`, potentially overwriting checkpoints during
resume. To fix this, either remove the `lora_model_dir` key entirely if you want
to use the default resume behavior, or set it explicitly to a different
directory path to avoid accidental overwrites.
|
📖 Documentation Preview: https://6895e8725f3f8d711fcf3adf--resonant-treacle-0fd729.netlify.app Deployed on Netlify from commit eb0f860 |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/arcee/README.md (1)
3-3: Fix malformed Markdown link
The link has two opening brackets, so Markdown renders it as plain text instead of a hyperlink.-[Arcee Foundation Models (AFM)]((https://huggingface.co/collections/arcee-ai/afm-45b-68823397c351603014963473)) +[Arcee Foundation Models (AFM)](https://huggingface.co/collections/arcee-ai/afm-45b-68823397c351603014963473)
🧹 Nitpick comments (1)
examples/arcee/README.md (1)
1-1: Hyphenate “Fine-tune” in the header
Style nit: “Fine-tune” is normally hyphenated when used as a verb-noun compound in titles.-# Finetune ArceeAI's AFM with Axolotl +# Fine-tune ArceeAI's AFM with Axolotl
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/arcee/README.md(1 hunks)examples/arcee/afm-4.5b-qlora.yaml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/arcee/afm-4.5b-qlora.yaml
🧰 Additional context used
🪛 LanguageTool
examples/arcee/README.md
[style] ~31-~31: Consider using polite language here.
Context: ...` This config uses about 7.8GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: pre-commit
- GitHub Check: pre-commit
- GitHub Check: preview
🔇 Additional comments (1)
examples/arcee/README.md (1)
16-17: Verify PyTorch version and fix casing
“PyTorch 2.6.0” does not yet exist (latest stable ≤ 2.x). This may confuse users. Also capitalise “PyTorch”.-# Ensure you have Pytorch installed (Pytorch 2.6.0 min) +# Ensure you have PyTorch installed (recommended ≥ 2.1.0 – update this when 2.6.0 is released)Please confirm the minimum supported version and update the doc accordingly.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/arcee/README.md (2)
1-1: Unify “fine-tune” spelling for consistency & professionalismThe doc alternates between “Finetune/finetuning” and the standard “fine-tune/fine-tuning”. Standardising improves polish and avoids search/grep misses.
-# Finetune ArceeAI's AFM with Axolotl +# Fine-tune ArceeAI's AFM with Axolotl @@ -This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. +This guide shows how to fine-tune it with Axolotl using multi-turn conversations and proper masking. @@ -Let us know how it goes. Happy finetuning! 🚀 +Let us know how it goes. Happy fine-tuning! 🚀Also applies to: 5-5, 32-32
20-21: Hard-pinning build tools may create avoidable conflictsLocking
packaging==23.2,setuptools==75.8.0, andwheelcan clash with users’ existing environments and future PyPI security fixes. Unless a specific bug is being avoided, consider relaxing or dropping these pins.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/arcee/README.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/arcee/README.md
[style] ~31-~31: Consider using polite language here.
Context: ...` This config uses about 7.8GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
🔇 Additional comments (2)
examples/arcee/README.md (2)
16-17: PyTorch 2.6.0 doesn’t exist yet—please verify the minimum versionCurrent upstream releases are 2.3.x. Referencing 2.6.0 may confuse users and break automated setup scripts.
3-3: Markdown link fixed – looks goodThe earlier extra bracket has been removed; the link now renders correctly.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (12)
examples/arcee/README.md (12)
1-1: Hyphenate and unify branding in the titleMinor wording polish for consistency with the rest of the doc (“Arcee.ai”) and standard hyphenation.
-# Finetune ArceeAI's AFM with Axolotl +# Fine-tune Arcee.ai's AFM with Axolotl
3-3: Grammar: “4.5B-parameter open-weight models”Hyphenate compound modifiers and numbers used adjectivally.
-[Arcee Foundation Models (AFM)](https://huggingface.co/collections/arcee-ai/afm-45b-68823397c351603014963473) are a family of 4.5B parameter open weight models trained by Arcee.ai. +[Arcee Foundation Models (AFM)](https://huggingface.co/collections/arcee-ai/afm-45b-68823397c351603014963473) are a family of 4.5B-parameter, open-weight models trained by Arcee.ai.
5-5: Tighten sentence and avoid double “with”Small readability tweak.
-This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. +This guide shows how to fine-tune AFM with Axolotl for multi-turn conversations with proper conversation masking.
7-7: Fix phrasing for supervised fine-tuning-Thanks to the team at Arcee.ai for using Axolotl in supervised fine-tuning the AFM model. +Thanks to the Arcee.ai team for using Axolotl for supervised fine-tuning of the AFM model.
11-11: Clarify “nightly” wordingMake it explicit that AFM support is on main/nightly builds or Docker.
-1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as AFM is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html). +1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). Install from the main branch (nightly), where AFM support is currently available, or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).
16-22: Installation accuracy and FlashAttention build notes
- PyTorch/CUDA/FlashAttention compatibility is hardware- and environment-dependent; the “2.6.0 min” claim may not be universally correct. Consider softening and pointing to the support matrix.
- Pessimistic pinning of
packaging==23.2and a very specificsetuptools==75.8.0can cause avoidable conflicts. Prefer upgrading tooling without strict pins unless required.-# Ensure you have Pytorch installed (Pytorch 2.6.0 min) +# Ensure you have PyTorch installed (verify supported CUDA/driver versions for your GPU; see Axolotl/FlashAttention docs) @@ -pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja +pip3 install -U pip setuptools wheel ninjaOptional follow-up additions (no diff shown):
- Note: FlashAttention generally requires Ampere/Hopper GPUs and a matching CUDA toolkit. If building
flash-attnfails, try installing without theflash-attnextra, or install prebuilt wheels where available.- If needed, export
CUDA_HOMEand ensurenvccis on PATH.
30-30: Qualify the VRAM figure with hardware and config contextVRAM usage depends on GPU, seq length, micro-batch size, gradient checkpointing, quantization, etc. Add the measurement context (GPU model, seq_len, micro_batch_size) or rephrase as an estimate.
32-32: Hyphenate “fine-tuning”Also addresses the stylistic hint flagged by static analysis.
-Let us know how it goes. Happy finetuning! 🚀 +Let us know how it goes. Happy fine-tuning! 🚀
34-34: Heading caseMatch typical style used elsewhere in the docs.
-### TIPS +### Tips
36-38: Wording and hyphenation; add a caution for full FTPolish wording and suggest noting that full FT may require adjusting memory-sensitive params.
-- For inference, the official Arcee.ai team recommends `top_p: 0.95`, `temperature: 0.5`, `top_k: 50`, and `repeat_penalty: 1.1`. -- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config. +- For inference, the Arcee.ai team recommends `top_p: 0.95`, `temperature: 0.5`, `top_k: 50`, and `repeat_penalty: 1.1`. +- To run full fine-tuning, remove `adapter: qlora` and `load_in_4bit: true` from the config.Follow-up (no diff): Add a note that full FT typically increases VRAM needs and may require adjusting
micro_batch_size,gradient_accumulation_steps, and enabling gradient checkpointing.
38-39: Dataset link + quick example
- Consider adding a minimal OpenAI Messages JSON example for clarity.
- Verify that the anchor
#chat_templateis correct for the linked page.Example snippet to include:
[ { "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Explain QLoRA in one sentence." }, { "role": "assistant", "content": "QLoRA fine-tunes a 4-bit quantized model using low-rank adapters to reduce memory usage." } ] } ]
41-46: Add optimization link: Cut Cross-Entropy (token pruning)Since the PR updates CCE integration, surface it here to help users train/evaluate faster.
## Optimization Guides @@ - [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html) - [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html) - [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html) + - Cut Cross-Entropy (token pruning) — see the integration README in this repo (verify link target) + - Example link: src/axolotl/integrations/cut_cross_entropy/README.md
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
examples/arcee/README.md(1 hunks)examples/arcee/afm-4.5b-qlora.yaml(1 hunks)examples/colab-notebooks/colab-axolotl-example.ipynb(1 hunks)examples/magistral/magistral-small-fsdp-qlora.yaml(0 hunks)examples/magistral/magistral-small-qlora.yaml(0 hunks)examples/magistral/magistral-small-think-qlora.yaml(0 hunks)scripts/cutcrossentropy_install.py(1 hunks)src/axolotl/integrations/cut_cross_entropy/README.md(3 hunks)src/axolotl/integrations/cut_cross_entropy/__init__.py(1 hunks)src/axolotl/monkeypatch/multipack.py(1 hunks)
💤 Files with no reviewable changes (3)
- examples/magistral/magistral-small-fsdp-qlora.yaml
- examples/magistral/magistral-small-think-qlora.yaml
- examples/magistral/magistral-small-qlora.yaml
✅ Files skipped from review due to trivial changes (3)
- scripts/cutcrossentropy_install.py
- src/axolotl/integrations/cut_cross_entropy/init.py
- examples/colab-notebooks/colab-axolotl-example.ipynb
🚧 Files skipped from review as they are similar to previous changes (3)
- src/axolotl/monkeypatch/multipack.py
- src/axolotl/integrations/cut_cross_entropy/README.md
- examples/arcee/afm-4.5b-qlora.yaml
🧰 Additional context used
🪛 LanguageTool
examples/arcee/README.md
[style] ~31-~31: Consider using polite language here.
Context: ...` This config uses about 7.8GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...
(INSERT_PLEASE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.0)
- GitHub Check: pre-commit
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.0)
- GitHub Check: pre-commit
- GitHub Check: preview
🔇 Additional comments (1)
examples/arcee/README.md (1)
27-27: ✅ Configuration VerifiedThe file
examples/arcee/afm-4.5b-qlora.yamlwas found and successfully parsed, containing the required keys (base_model,datasets,adapter). No further changes are needed.
Description
Arcee.ai 's AFM model was trained in Axolotl. This PR adds configs for fine-tuning it.
Motivation and Context
How has this been tested?
Manual run working.
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores