Skip to content

Feat: add MiMo and Plano#3332

Merged
NanoCode012 merged 10 commits into
mainfrom
feat/mimo
Dec 25, 2025
Merged

Feat: add MiMo and Plano#3332
NanoCode012 merged 10 commits into
mainfrom
feat/mimo

Conversation

@NanoCode012

@NanoCode012 NanoCode012 commented Dec 24, 2025

Copy link
Copy Markdown
Collaborator

Description

MiMo is an older model by Xiaomi

image

Plano is a model built on Qwen3 and Qwen3MoE arch.

image

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • New Features

    • Added fine-tuning support for Xiaomi MiMo-7B models with QLoRA configuration
    • Added fine-tuning support for Plano-Orchestrator-4B models with QLoRA configuration
  • Documentation

    • Added comprehensive guides for MiMo and Plano-Orchestrator fine-tuning including VRAM recommendations
    • Documented Cut Cross Entropy limitations for Trinity and MiMo models

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai

coderabbitai Bot commented Dec 24, 2025

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This pull request adds example configurations and documentation for two new models (MiMo and Plano-Orchestrator) with QLoRA fine-tuning setups, updates the main README to reference these examples, adds a limitations note to the Trinity example, and includes a model revision field in the Trinity configuration.

Changes

Cohort / File(s) Change Summary
MiMo Example Configuration
examples/mimo/README.md, examples/mimo/mimo-7b-qlora.yaml
New documentation and QLoRA configuration for fine-tuning MiMo-7B-RL model with Axolotl, including training commands, VRAM considerations, and dataset references.
Plano Example Configuration
examples/plano/README.md, examples/plano/plano-4b-qlora.yaml
New documentation and QLoRA configuration for fine-tuning Plano-Orchestrator 4B model, including orchestration prompt guidance and LoRA projection modules.
Trinity Example Updates
examples/trinity/README.md, examples/trinity/trinity-nano-preview-qlora.yaml
Added limitations section documenting lack of Cut Cross Entropy support and added model revision field (2ee94b0) to configuration.
Main Documentation
README.md
Updated 2025/12 latest updates entry to reference new MiMo and Plano-Orchestrator examples alongside existing Olmo3, Trinity, and Ministral3.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • #3292: Overlapping modifications to Trinity example files (README and configuration).
  • #3297: Parallel example file additions for Ministral3 configuration.

Suggested labels

ready to merge

Suggested reviewers

  • winglian
  • SalmanMohammadi

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat: add MiMo and Plano' clearly and concisely summarizes the main change: adding two new model examples (MiMo and Plano) to the repository.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
examples/plano/plano-4b-qlora.yaml (1)

1-65: Configuration looks good, but consider pinning a model revision.

The QLoRA configuration is well-structured with appropriate parameters. The use of Cut Cross Entropy plugin aligns with the README guidance.

Consider adding revision_of_model for reproducibility, similar to the Trinity and MiMo examples:

 base_model: katanemo/Plano-Orchestrator-4B
+revision_of_model: <commit_hash>

This ensures consistent behavior across training runs.

examples/mimo/mimo-7b-qlora.yaml (1)

15-17: Consider explicitly specifying chat_template.

The dataset type is chat_template but no explicit chat_template field is specified (unlike the Plano example which uses chat_template: qwen3). If MiMo requires a specific chat template, it should be explicitly declared for clarity.

If a specific template is needed, add it explicitly:

 datasets:
   - path: fozziethebeat/alpaca_messages_2k_test
     type: chat_template
+    # chat_template: <template_name>  # Specify if needed
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2155ea and ad93346.

📒 Files selected for processing (7)
  • README.md
  • examples/mimo/README.md
  • examples/mimo/mimo-7b-qlora.yaml
  • examples/plano/README.md
  • examples/plano/plano-4b-qlora.yaml
  • examples/trinity/README.md
  • examples/trinity/trinity-nano-preview-qlora.yaml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-22T13:23:41.455Z
Learnt from: winglian
Repo: axolotl-ai-cloud/axolotl PR: 3095
File: src/axolotl/cli/merge_lora.py:65-81
Timestamp: 2025-08-22T13:23:41.455Z
Learning: The `lora_on_cpu` configuration in Axolotl is only relevant when loading the full model into memory (standard LoRA merge approach), not when processing individual shards in the memory-efficient approach.

Applied to files:

  • examples/plano/plano-4b-qlora.yaml
  • examples/mimo/mimo-7b-qlora.yaml
🪛 LanguageTool
examples/mimo/README.md

[style] ~17-~17: Consider using polite language here.
Context: ... This config uses about 17.2 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

examples/plano/README.md

[style] ~19-~19: Consider using polite language here.
Context: ...` This config uses about 5.1 GiB VRAM. Let us know how it goes. Happy finetuning! 🚀 ### ...

(INSERT_PLEASE)

🪛 markdownlint-cli2 (0.18.1)
examples/mimo/README.md

23-23: Link text should be descriptive

(MD059, descriptive-link-text)

examples/plano/README.md

30-30: Link text should be descriptive

(MD059, descriptive-link-text)

🔇 Additional comments (7)
README.md (1)

32-32: LGTM! Clear documentation of new model support.

The update properly highlights the addition of Plano-Orchestrator and MiMo examples alongside existing models with correct links to their respective directories.

examples/mimo/README.md (1)

1-39: Well-documented guide for MiMo fine-tuning.

The README provides clear instructions and context about MiMo's Multiple-Token Prediction approach, with appropriate references to the paper and resources.

examples/mimo/mimo-7b-qlora.yaml (2)

1-67: Configuration looks good with proper revision pinning.

The QLoRA configuration is well-structured with trust_remote_code and pinned revision for reproducibility.


61-61: MiMo-7B-RL is a standard transformer model that supports flash attention through Axolotl's training framework. The configuration is valid and properly supported.

examples/trinity/trinity-nano-preview-qlora.yaml (1)

3-3: Good practice: pinning model revision for reproducibility.

The specific model revision 2ee94b0 is valid for the arcee-ai/Trinity-Nano-Preview model and ensures consistent behavior across runs.

examples/plano/README.md (1)

1-42: Documentation is accurate and well-structured.

The README provides clear fine-tuning instructions for Plano-Orchestrator. The stated 5.1 GiB VRAM usage for 4B + QLoRA is realistic and within expected ranges for this configuration. Industry documentation confirms QLoRA fine-tuning of 4B models typically consumes 3–5 GB VRAM, and the Cut Cross Entropy plugin further reduces loss computation overhead by chunking logit materialization rather than creating full vocab×tokens matrices. No verification concerns identified.

examples/plano/plano-4b-qlora.yaml (1)

59-59: Plano-Orchestrator-4B supports flash attention, so the configuration is correct. The difference from the Trinity example (where flash attention is commented as "Not supported") reflects that these models have different attention capabilities—Trinity does not support flash attention while Plano-Orchestrator-4B does.

Comment thread examples/trinity/README.md Outdated
@github-actions

github-actions Bot commented Dec 24, 2025

Copy link
Copy Markdown
Contributor

📖 Documentation Preview: https://694d1ccdba8931729e083521--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 88e451b

@NanoCode012 NanoCode012 merged commit 4f5e8a3 into main Dec 25, 2025
3 checks passed
@NanoCode012 NanoCode012 deleted the feat/mimo branch December 25, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants