qwen3.5 configs by ved1beta · Pull Request #3554 · axolotl-ai-cloud/axolotl

ved1beta · 2026-03-27T11:53:27Z

change lora_target_modules:

gate_up_proj
down_proj

Summary by CodeRabbit

Release Notes

Documentation
- Updated configuration examples to clarify targeting options for shared and routed expert modules in mixture-of-experts models.
- Added guidance on using alternative configuration parameters for different expert types.

coderabbitai · 2026-03-27T11:53:44Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb10a12e-3426-431e-911d-649efc23f62a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR updates documentation in 4 Qwen3.5 LoRA MoE example configurations by replacing regex-based expert targeting guidance with commented entries for shared-expert modules (gate_up_proj, down_proj) and clarifying that routed experts should use lora_target_parameters.

Changes

Cohort / File(s)	Summary
Qwen3.5 MoE LoRA Config Documentation `examples/qwen3.5/122b-a10b-moe-qlora-fsdp.yaml`, `examples/qwen3.5/122b-a10b-moe-qlora.yaml`, `examples/qwen3.5/35b-a3b-moe-qlora-fsdp.yaml`, `examples/qwen3.5/35b-a3b-moe-qlora.yaml`	Added commented entries for shared-expert module targets (`gate_up_proj`, `down_proj`); removed regex-based targeting guidance; clarified that routed experts are 3D parameters requiring `lora_target_parameters` configuration.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

feat: add doc for expert quantization, glm45 air example configs, and update readme for release #3452: Addresses MoE LoRA configuration guidance for expert handling in example configs
feat(qwen3-next): Adds targeting of shared expert and attention modules #3183: Modifies Qwen3-style YAML configs to document shared-expert projection module targeting
Qwen3.5-MoE example config with lora_target_modules regex #3515: Directly modifies the same Qwen3.5 example YAMLs' lora_target_modules configuration

Suggested reviewers

NanoCode012
winglian

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'qwen3.5 configs' is vague and generic, using non-descriptive terminology that doesn't convey the specific nature of the changes (updating lora_target_modules documentation and comments across four configuration files).	Consider using a more specific title like 'Update qwen3.5 LoRA configs with shared expert targeting guidance' to clearly indicate the main purpose of the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

NanoCode012 · 2026-03-27T12:04:28Z

Could you just test one of these configs to ensure step by step that the below are correct?

leaving them all commented: low trainable params
target experts: more trainable params
target experts + shared experts: even more params

ved1beta · 2026-03-29T05:35:27Z

tested on nemo super branch #3508

NanoCode012

Could you also update L60 in the README on Shared Experts to point to this new case?

qwen3.5 configs

cc6724a

NanoCode012 reviewed Mar 30, 2026

View reviewed changes

ved1beta force-pushed the pram_qwen3.5 branch from 9d82756 to cc6724a Compare March 30, 2026 09:41

ved1beta and others added 2 commits March 30, 2026 15:13

Merge branch 'main' into pram_qwen3.5

9dd323d

update shared experts readme

9fbe673

NanoCode012 approved these changes Mar 31, 2026

View reviewed changes

winglian merged commit 9e64c76 into axolotl-ai-cloud:main Apr 1, 2026
3 checks passed

coderabbitai Bot mentioned this pull request Apr 2, 2026

fix(yaml): add cce and liger to nemotron-h example #3573

Merged

coderabbitai Bot mentioned this pull request Apr 10, 2026

Gemma4 fixes and profiler #3591

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen3.5 configs#3554

qwen3.5 configs#3554
winglian merged 3 commits into
axolotl-ai-cloud:mainfrom
ved1beta:pram_qwen3.5

ved1beta commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

NanoCode012 commented Mar 27, 2026 •

edited

Loading

Uh oh!

ved1beta commented Mar 29, 2026 •

edited

Loading

Uh oh!

NanoCode012 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ved1beta commented Mar 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

NanoCode012 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ved1beta commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NanoCode012 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ved1beta commented Mar 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 27, 2026 •

edited

Loading

NanoCode012 commented Mar 27, 2026 •

edited

Loading

ved1beta commented Mar 29, 2026 •

edited

Loading