Skip to content

qwen3.5 configs#3554

Merged
winglian merged 3 commits into
axolotl-ai-cloud:mainfrom
ved1beta:pram_qwen3.5
Apr 1, 2026
Merged

qwen3.5 configs#3554
winglian merged 3 commits into
axolotl-ai-cloud:mainfrom
ved1beta:pram_qwen3.5

Conversation

@ved1beta
Copy link
Copy Markdown
Member

@ved1beta ved1beta commented Mar 27, 2026

change lora_target_modules:

  • gate_up_proj
  • down_proj

Summary by CodeRabbit

Release Notes

  • Documentation
    • Updated configuration examples to clarify targeting options for shared and routed expert modules in mixture-of-experts models.
    • Added guidance on using alternative configuration parameters for different expert types.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 27, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb10a12e-3426-431e-911d-649efc23f62a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR updates documentation in 4 Qwen3.5 LoRA MoE example configurations by replacing regex-based expert targeting guidance with commented entries for shared-expert modules (gate_up_proj, down_proj) and clarifying that routed experts should use lora_target_parameters.

Changes

Cohort / File(s) Summary
Qwen3.5 MoE LoRA Config Documentation
examples/qwen3.5/122b-a10b-moe-qlora-fsdp.yaml, examples/qwen3.5/122b-a10b-moe-qlora.yaml, examples/qwen3.5/35b-a3b-moe-qlora-fsdp.yaml, examples/qwen3.5/35b-a3b-moe-qlora.yaml
Added commented entries for shared-expert module targets (gate_up_proj, down_proj); removed regex-based targeting guidance; clarified that routed experts are 3D parameters requiring lora_target_parameters configuration.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Suggested reviewers

  • NanoCode012
  • winglian
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'qwen3.5 configs' is vague and generic, using non-descriptive terminology that doesn't convey the specific nature of the changes (updating lora_target_modules documentation and comments across four configuration files). Consider using a more specific title like 'Update qwen3.5 LoRA configs with shared expert targeting guidance' to clearly indicate the main purpose of the changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@NanoCode012
Copy link
Copy Markdown
Collaborator

NanoCode012 commented Mar 27, 2026

Could you just test one of these configs to ensure step by step that the below are correct?

  1. leaving them all commented: low trainable params
  2. target experts: more trainable params
  3. target experts + shared experts: even more params

@ved1beta
Copy link
Copy Markdown
Member Author

ved1beta commented Mar 29, 2026

tested on nemo super branch #3508
image

Copy link
Copy Markdown
Collaborator

@NanoCode012 NanoCode012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update L60 in the README on Shared Experts to point to this new case?

@winglian winglian merged commit 9e64c76 into axolotl-ai-cloud:main Apr 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants