Skip to content

Add support for Qwen3.5 MoE#1109

Merged
Mecoli1219 merged 7 commits intolinkedin:mainfrom
michaelroyzen:add-qwen3_5_moe
Mar 2, 2026
Merged

Add support for Qwen3.5 MoE#1109
Mecoli1219 merged 7 commits intolinkedin:mainfrom
michaelroyzen:add-qwen3_5_moe

Conversation

@michaelroyzen
Copy link
Contributor

@michaelroyzen michaelroyzen commented Feb 26, 2026

Add Qwen3.5 MoE support to Liger Kernel

Summary

  • Adds Liger Kernel optimizations for the Qwen3.5 MoE model family (qwen3_5_moe / qwen3_5_moe_text), targeting Transformers v5+
  • Qwen3.5 MoE combines Qwen3 Next's hybrid GDN/attention architecture with Sparse MoE (shared + routed experts), so the implementation mirrors Qwen3 Next's Liger integration: Gemma-style RMSNorm (LigerRMSNormForQwen3Next), fused SwiGLU experts (LigerExperts), and fused linear cross-entropy loss

Changes

New file:

  • src/liger_kernel/transformers/model/qwen3_5_moe.pylce_forward for Qwen3_5MoeForCausalLM, based on the Qwen3 Next version with the load_balancing_loss_func import updated to point to Qwen3.5 MoE's local definition

Modified files:

  • src/liger_kernel/transformers/monkey_patch.pyapply_liger_kernel_to_qwen3_5_moe function (RMSNorm, SwiGLU experts, fused LCE; RoPE disabled) with instance patching for norm layers, shared expert, and routed experts; registered as qwen3_5_moe and qwen3_5_moe_text in MODEL_TYPE_TO_APPLY_LIGER_FN
  • src/liger_kernel/transformers/__init__.py — Export apply_liger_kernel_to_qwen3_5_moe in TYPE_CHECKING, __getattr__, and __all__
  • test/utils.pyrevert_liger_kernel_to_qwen3_5_moe for test cleanup
  • test/convergence/fp32/test_mini_models.py — Availability check, imports, and MiniModelConfig entry for mini_qwen3_5_moe
  • test/transformers/test_monkey_patch.pyis_qwen3_5_moe_available helper and test_apply_liger_kernel_to_instance_for_qwen3_5_moe verifying all patches are applied correctly

Test plan

  • test_apply_liger_kernel_to_instance_for_qwen3_5_moe passes (monkey patch instance patching)
  • mini_qwen3_5_moe convergence test passes (fp32 mini model)
  • Existing Qwen3 Next and Qwen3 MoE tests still pass (no regressions)

@michaelroyzen michaelroyzen mentioned this pull request Feb 26, 2026
@michaelroyzen
Copy link
Contributor Author

@shimizust @Tcc0403

@michaelroyzen
Copy link
Contributor Author

michaelroyzen commented Feb 26, 2026

Convergence test passes
Screenshot 2026-02-26 at 2 48 14 PM

Copy link
Collaborator

@Tcc0403 Tcc0403 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mecoli1219 can you take a look?

@michaelroyzen
Copy link
Contributor Author

michaelroyzen commented Feb 27, 2026

Screenshot 2026-02-27 at 1 46 37 PM Screenshot 2026-02-27 at 1 47 22 PM

Confirming Qwen3-Next still passes

@michaelroyzen
Copy link
Contributor Author

Are we ready to merge @Tcc0403 @Mecoli1219?

Copy link
Collaborator

@Mecoli1219 Mecoli1219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelroyzen This looks great! Thanks for the contribution. Could you please rebase with the main branch and run make checkstyle to ensure the formatting is consistent? Let's get this merged once the build is green!

@michaelroyzen
Copy link
Contributor Author

Thanks, just rebased and ran make checkstyle @Mecoli1219

@Mecoli1219 Mecoli1219 added this pull request to the merge queue Mar 2, 2026
Merged via the queue into linkedin:main with commit 9983acb Mar 2, 2026
5 of 7 checks passed
@vvvdwbvvv vvvdwbvvv mentioned this pull request Mar 7, 2026
github-merge-queue bot pushed a commit that referenced this pull request Mar 12, 2026
## Summary

This PR fixes the `lce_forward` function for qwen3_5_moe model, adding
support for `mm_token_type_ids` optional parameter related to multimodal
processing.

Follow-up to:
- #1120
- #1109

This fixes a ValueError in `model.generate()` with transformers > 5.2.0,
after they merged:
- huggingface/transformers#43972

See related issue downstream in TRL:
- huggingface/trl#5216
- huggingface/trl#5201

<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants