feat(qwen3-next): Adds targeting of shared expert and attention modules by miketung · Pull Request #3183 · axolotl-ai-cloud/axolotl

miketung · 2025-09-24T22:51:20Z

Description

Adds targeting of shared experts and attention modules for qwen3-next layers. Tested on 1XH200 and 8XH200.

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

New Features
- Expanded QLoRA configuration to target additional components across attention, expert, and MLP layers, enabling more comprehensive fine-tuning coverage and potentially improved training outcomes.
Chores
- Updated configuration defaults to include the new target components for broader adapter application during fine-tuning.

coderabbitai · 2025-09-24T22:51:28Z

📝 Walkthrough

Walkthrough

Added eight entries to the lora_target_modules list in the QLoRA YAML config for qwen3-next-80b-a3b. No other changes.

Changes

Cohort / File(s)	Summary
QLoRA config update `examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml`	Added to lora_target_modules: linear_attn.in_proj_ba, linear_attn.in_proj_qkvz, linear_attn.out_proj, shared_expert.up_proj, shared_expert.down_proj, shared_expert.gate_proj, shared_expert_gate, mlp.gate

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Title Check	✅ Passed	The title succinctly and accurately summarizes the primary change—adding targeting of shared expert and attention modules for qwen3-next—and follows the conventional commit style, reflecting the added LORA target module entries without unnecessary detail.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 856ff12 and 34f8854.

📒 Files selected for processing (1)

examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml (1 hunks)

NanoCode012 · 2025-09-25T07:29:00Z

Thanks, could you also update the README with the updated vram usage?

Adds targetting of shared expert and attention modules in each layer

34f8854

coderabbitai Bot reviewed Sep 24, 2025

View reviewed changes

Comment thread examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml

NanoCode012 self-assigned this Sep 25, 2025

Update VRAM usage

31362aa

NanoCode012 approved these changes Sep 25, 2025

View reviewed changes

NanoCode012 changed the title ~~Adds targetting of shared expert and attention modules in each layer~~ feat(qwen3-next): Adds targetting of shared expert and attention modules Sep 25, 2025

NanoCode012 changed the title ~~feat(qwen3-next): Adds targetting of shared expert and attention modules~~ feat(qwen3-next): Adds targeting of shared expert and attention modules Sep 25, 2025

NanoCode012 merged commit 33975ce into axolotl-ai-cloud:main Sep 25, 2025
1 check passed

coderabbitai Bot mentioned this pull request Mar 27, 2026

qwen3.5 configs #3554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(qwen3-next): Adds targeting of shared expert and attention modules#3183

feat(qwen3-next): Adds targeting of shared expert and attention modules#3183
NanoCode012 merged 2 commits into
axolotl-ai-cloud:mainfrom
miketung:fix/qwen3-next-config

miketung commented Sep 24, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Sep 24, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

NanoCode012 commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

miketung commented Sep 24, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NanoCode012 commented Sep 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

miketung commented Sep 24, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Sep 24, 2025 •

edited

Loading