Merged
Conversation
Closed
Contributor
Author
Contributor
Author
Tcc0403
reviewed
Feb 27, 2026
Collaborator
Tcc0403
left a comment
There was a problem hiding this comment.
@Mecoli1219 can you take a look?
Contributor
Author
Contributor
Author
|
Are we ready to merge @Tcc0403 @Mecoli1219? |
Mecoli1219
reviewed
Mar 1, 2026
Collaborator
Mecoli1219
left a comment
There was a problem hiding this comment.
@michaelroyzen This looks great! Thanks for the contribution. Could you please rebase with the main branch and run make checkstyle to ensure the formatting is consistent? Let's get this merged once the build is green!
… bf16 test matching Qwen3 MoE tolerances
390cd7d to
7a83092
Compare
Contributor
Author
|
Thanks, just rebased and ran |
Mecoli1219
approved these changes
Mar 2, 2026
1 task
Closed
3 tasks
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
## Summary This PR fixes the `lce_forward` function for qwen3_5_moe model, adding support for `mm_token_type_ids` optional parameter related to multimodal processing. Follow-up to: - #1120 - #1109 This fixes a ValueError in `model.generate()` with transformers > 5.2.0, after they merged: - huggingface/transformers#43972 See related issue downstream in TRL: - huggingface/trl#5216 - huggingface/trl#5201 <!--- ## Details This is an optional section; is there anything specific that reviewers should be aware of? ---> ## Testing Done <!--- This is a required section; please describe how this change was tested. ---> <!-- Replace BLANK with your device type. For example, A100-80G-PCIe Complete the following tasks before sending your PR, and replace `[ ]` with `[x]` to indicate you have done them. --> - Hardware Type: <BLANK> - [ ] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code style - [ ] run `make test-convergence` to ensure convergence
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Add Qwen3.5 MoE support to Liger Kernel
Summary
qwen3_5_moe/qwen3_5_moe_text), targeting Transformers v5+LigerRMSNormForQwen3Next), fused SwiGLU experts (LigerExperts), and fused linear cross-entropy lossChanges
New file:
src/liger_kernel/transformers/model/qwen3_5_moe.py—lce_forwardforQwen3_5MoeForCausalLM, based on the Qwen3 Next version with theload_balancing_loss_funcimport updated to point to Qwen3.5 MoE's local definitionModified files:
src/liger_kernel/transformers/monkey_patch.py—apply_liger_kernel_to_qwen3_5_moefunction (RMSNorm, SwiGLU experts, fused LCE; RoPE disabled) with instance patching for norm layers, shared expert, and routed experts; registered asqwen3_5_moeandqwen3_5_moe_textinMODEL_TYPE_TO_APPLY_LIGER_FNsrc/liger_kernel/transformers/__init__.py— Exportapply_liger_kernel_to_qwen3_5_moeinTYPE_CHECKING,__getattr__, and__all__test/utils.py—revert_liger_kernel_to_qwen3_5_moefor test cleanuptest/convergence/fp32/test_mini_models.py— Availability check, imports, andMiniModelConfigentry formini_qwen3_5_moetest/transformers/test_monkey_patch.py—is_qwen3_5_moe_availablehelper andtest_apply_liger_kernel_to_instance_for_qwen3_5_moeverifying all patches are applied correctlyTest plan
test_apply_liger_kernel_to_instance_for_qwen3_5_moepasses (monkey patch instance patching)mini_qwen3_5_moeconvergence test passes (fp32 mini model)