Feat: add MiMo and Plano by NanoCode012 · Pull Request #3332 · axolotl-ai-cloud/axolotl

NanoCode012 · 2025-12-24T09:31:48Z

Description

MiMo is an older model by Xiaomi

Plano is a model built on Qwen3 and Qwen3MoE arch.

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Added fine-tuning support for Xiaomi MiMo-7B models with QLoRA configuration
- Added fine-tuning support for Plano-Orchestrator-4B models with QLoRA configuration
Documentation
- Added comprehensive guides for MiMo and Plano-Orchestrator fine-tuning including VRAM recommendations
- Documented Cut Cross Entropy limitations for Trinity and MiMo models

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-24T09:31:56Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This pull request adds example configurations and documentation for two new models (MiMo and Plano-Orchestrator) with QLoRA fine-tuning setups, updates the main README to reference these examples, adds a limitations note to the Trinity example, and includes a model revision field in the Trinity configuration.

Changes

Cohort / File(s)	Change Summary
MiMo Example Configuration `examples/mimo/README.md`, `examples/mimo/mimo-7b-qlora.yaml`	New documentation and QLoRA configuration for fine-tuning MiMo-7B-RL model with Axolotl, including training commands, VRAM considerations, and dataset references.
Plano Example Configuration `examples/plano/README.md`, `examples/plano/plano-4b-qlora.yaml`	New documentation and QLoRA configuration for fine-tuning Plano-Orchestrator 4B model, including orchestration prompt guidance and LoRA projection modules.
Trinity Example Updates `examples/trinity/README.md`, `examples/trinity/trinity-nano-preview-qlora.yaml`	Added limitations section documenting lack of Cut Cross Entropy support and added model revision field (2ee94b0) to configuration.
Main Documentation `README.md`	Updated 2025/12 latest updates entry to reference new MiMo and Plano-Orchestrator examples alongside existing Olmo3, Trinity, and Ministral3.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

#3292: Overlapping modifications to Trinity example files (README and configuration).
#3297: Parallel example file additions for Ministral3 configuration.

Suggested labels

ready to merge

Suggested reviewers

winglian
SalmanMohammadi

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Feat: add MiMo and Plano' clearly and concisely summarizes the main change: adding two new model examples (MiMo and Plano) to the repository.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

examples/plano/plano-4b-qlora.yaml (1)
1-65: Configuration looks good, but consider pinning a model revision.

The QLoRA configuration is well-structured with appropriate parameters. The use of Cut Cross Entropy plugin aligns with the README guidance.

Consider adding revision_of_model for reproducibility, similar to the Trinity and MiMo examples:
 base_model: katanemo/Plano-Orchestrator-4B
+revision_of_model: <commit_hash>
This ensures consistent behavior across training runs.
examples/mimo/mimo-7b-qlora.yaml (1)
15-17: Consider explicitly specifying chat_template.

The dataset type is chat_template but no explicit chat_template field is specified (unlike the Plano example which uses chat_template: qwen3). If MiMo requires a specific chat template, it should be explicitly declared for clarity.

If a specific template is needed, add it explicitly:
 datasets:
   - path: fozziethebeat/alpaca_messages_2k_test
     type: chat_template
+    # chat_template: <template_name>  # Specify if needed

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2155ea and ad93346.

📒 Files selected for processing (7)

README.md
examples/mimo/README.md
examples/mimo/mimo-7b-qlora.yaml
examples/plano/README.md
examples/plano/plano-4b-qlora.yaml
examples/trinity/README.md
examples/trinity/trinity-nano-preview-qlora.yaml

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-22T13:23:41.455Z

Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The `lora_on_cpu` configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.

Applied to files:

examples/plano/plano-4b-qlora.yaml
examples/mimo/mimo-7b-qlora.yaml

🪛 LanguageTool

examples/mimo/README.md

[style] ~17-~17: Consider using polite language here.
Context: ... This config uses about 17.2 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

examples/plano/README.md

[style] ~19-~19: Consider using polite language here.
Context: ...` This config uses about 5.1 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

🪛 markdownlint-cli2 (0.18.1)

examples/mimo/README.md

23-23: Link text should be descriptive

(MD059, descriptive-link-text)

examples/plano/README.md

30-30: Link text should be descriptive

(MD059, descriptive-link-text)

🔇 Additional comments (7)

README.md (1)

32-32: LGTM! Clear documentation of new model support.

The update properly highlights the addition of Plano-Orchestrator and MiMo examples alongside existing models with correct links to their respective directories.

examples/mimo/README.md (1)

1-39: Well-documented guide for MiMo fine-tuning.

The README provides clear instructions and context about MiMo's Multiple-Token Prediction approach, with appropriate references to the paper and resources.

examples/mimo/mimo-7b-qlora.yaml (2)

1-67: Configuration looks good with proper revision pinning.

The QLoRA configuration is well-structured with trust_remote_code and pinned revision for reproducibility.

61-61: MiMo-7B-RL is a standard transformer model that supports flash attention through Axolotl's training framework. The configuration is valid and properly supported.

examples/trinity/trinity-nano-preview-qlora.yaml (1)

3-3: Good practice: pinning model revision for reproducibility.

The specific model revision 2ee94b0 is valid for the arcee-ai/Trinity-Nano-Preview model and ensures consistent behavior across runs.

examples/plano/README.md (1)

1-42: Documentation is accurate and well-structured.

The README provides clear fine-tuning instructions for Plano-Orchestrator. The stated 5.1 GiB VRAM usage for 4B + QLoRA is realistic and within expected ranges for this configuration. Industry documentation confirms QLoRA fine-tuning of 4B models typically consumes 3–5 GB VRAM, and the Cut Cross Entropy plugin further reduces loss computation overhead by chunking logit materialization rather than creating full vocab×tokens matrices. No verification concerns identified.

examples/plano/plano-4b-qlora.yaml (1)

59-59: Plano-Orchestrator-4B supports flash attention, so the configuration is correct. The difference from the Trinity example (where flash attention is commented as "Not supported") reflects that these models have different attention capabilities—Trinity does not support flash attention while Plano-Orchestrator-4B does.

github-actions · 2025-12-24T09:45:09Z

📖 Documentation Preview: https://694d1ccdba8931729e083521--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 88e451b

NanoCode012 added 7 commits December 24, 2025 15:49

feat: add xiaomi's mimo 7b

85156cc

fix: pin revision

6bfa0de

fix: update trinity docs and pin revision

975b32a

fix: wrong config name

1ccb567

feat: add vram usage

54820a3

feat: add plano

b73e860

feat: update plano vram usage

ad93346

coderabbitai Bot reviewed Dec 24, 2025

View reviewed changes

Comment thread examples/trinity/README.md Outdated

NanoCode012 added 2 commits December 24, 2025 16:38

Merge branch 'main' into feat/mimo

f535eb1

chore: comments

eb19b04

NanoCode012 added the ready to merge label Dec 24, 2025

Merge branch 'main' into feat/mimo

88e451b

NanoCode012 merged commit 4f5e8a3 into main Dec 25, 2025
3 checks passed

NanoCode012 deleted the feat/mimo branch December 25, 2025 11:09

winglian removed the ready to merge label Mar 22, 2026

coderabbitai Bot mentioned this pull request Apr 2, 2026

fix(yaml): add cce and liger to nemotron-h example #3573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: add MiMo and Plano#3332

Feat: add MiMo and Plano#3332
NanoCode012 merged 10 commits into
mainfrom
feat/mimo

NanoCode012 commented Dec 24, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 24, 2025 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Dec 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

NanoCode012 commented Dec 24, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NanoCode012 commented Dec 24, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 24, 2025 •

edited

Loading

github-actions Bot commented Dec 24, 2025 •

edited

Loading