Skip to content

feat(qwen3-next): Adds targeting of shared expert and attention modules#3183

Merged
NanoCode012 merged 2 commits into
axolotl-ai-cloud:mainfrom
miketung:fix/qwen3-next-config
Sep 25, 2025
Merged

feat(qwen3-next): Adds targeting of shared expert and attention modules#3183
NanoCode012 merged 2 commits into
axolotl-ai-cloud:mainfrom
miketung:fix/qwen3-next-config

Conversation

@miketung
Copy link
Copy Markdown
Contributor

@miketung miketung commented Sep 24, 2025

Description

Adds targeting of shared experts and attention modules for qwen3-next layers. Tested on 1XH200 and 8XH200.

Motivation and Context

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

Summary by CodeRabbit

  • New Features

    • Expanded QLoRA configuration to target additional components across attention, expert, and MLP layers, enabling more comprehensive fine-tuning coverage and potentially improved training outcomes.
  • Chores

    • Updated configuration defaults to include the new target components for broader adapter application during fine-tuning.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 24, 2025

📝 Walkthrough

Walkthrough

Added eight entries to the lora_target_modules list in the QLoRA YAML config for qwen3-next-80b-a3b. No other changes.

Changes

Cohort / File(s) Summary
QLoRA config update
examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml
Added to lora_target_modules: linear_attn.in_proj_ba, linear_attn.in_proj_qkvz, linear_attn.out_proj, shared_expert.up_proj, shared_expert.down_proj, shared_expert.gate_proj, shared_expert_gate, mlp.gate

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title succinctly and accurately summarizes the primary change—adding targeting of shared expert and attention modules for qwen3-next—and follows the conventional commit style, reflecting the added LORA target module entries without unnecessary detail.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 856ff12 and 34f8854.

📒 Files selected for processing (1)
  • examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml (1 hunks)

Comment thread examples/qwen3-next/qwen3-next-80b-a3b-qlora.yaml
@NanoCode012 NanoCode012 self-assigned this Sep 25, 2025
@NanoCode012
Copy link
Copy Markdown
Collaborator

Thanks, could you also update the README with the updated vram usage?

@NanoCode012 NanoCode012 changed the title Adds targetting of shared expert and attention modules in each layer feat(qwen3-next): Adds targetting of shared expert and attention modules Sep 25, 2025
@NanoCode012 NanoCode012 changed the title feat(qwen3-next): Adds targetting of shared expert and attention modules feat(qwen3-next): Adds targeting of shared expert and attention modules Sep 25, 2025
@NanoCode012 NanoCode012 merged commit 33975ce into axolotl-ai-cloud:main Sep 25, 2025
1 check passed
@coderabbitai coderabbitai Bot mentioned this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants