-
Notifications
You must be signed in to change notification settings - Fork 393
Add Gemma4 MoE quantization support #1219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
424cf24
Add VLM base model support for auto_quantize in hf_ptq
yueshen2016 b415c60
Fix code quality: add mypy assert and ruff blank lines
yueshen2016 9f50af6
Add Gemma4 MoE quantization support
yueshen2016 73b799c
Fix code quality: ruff SIM103 and line formatting
yueshen2016 53de662
Address PR review comments
yueshen2016 0f3b3bd
Add Gemma4 to get_experts_list for export compatibility
yueshen2016 63b04ba
Add *.experts.* patterns to nvfp4_mlp_only YAML recipe
yueshen2016 ea21ea4
Simplify auto_quantize base-model branching in hf_ptq
yueshen2016 01fc243
Add unit tests for is_moe structural detection and get_expert_linear_…
yueshen2016 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """Unit tests for modelopt.torch.export.layer_utils — MoE detection and expert naming.""" | ||
|
|
||
| import pytest | ||
| import torch.nn as nn | ||
|
|
||
| from modelopt.torch.export.layer_utils import get_expert_linear_names, is_moe | ||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # is_moe tests | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class _FakeSparseMoeBlock(nn.Module): | ||
| """Name ends with 'sparsemoeblock' — detected by naming convention.""" | ||
|
|
||
|
|
||
| class _FakeMoeLayer(nn.Module): | ||
| """Name contains 'moelayer' — detected by naming convention.""" | ||
|
|
||
|
|
||
| class _FakeArcticMoe(nn.Module): | ||
| """Name contains 'arcticmoe' — detected by explicit match.""" | ||
|
|
||
|
|
||
| class _StructuralMoeModule(nn.Module): | ||
| """Has router + experts attributes — detected by structural check.""" | ||
|
|
||
| def __init__(self): | ||
| super().__init__() | ||
| self.router = nn.Linear(8, 4) | ||
| self.experts = nn.ModuleList([nn.Linear(8, 8) for _ in range(4)]) | ||
|
|
||
|
|
||
| class _NotMoeModule(nn.Module): | ||
| """Plain module — should NOT be classified as MoE.""" | ||
|
|
||
| def __init__(self): | ||
| super().__init__() | ||
| self.fc = nn.Linear(8, 8) | ||
|
|
||
|
|
||
| class _PartialStructuralModule(nn.Module): | ||
| """Has router but no experts — should NOT be classified as MoE.""" | ||
|
|
||
| def __init__(self): | ||
| super().__init__() | ||
| self.router = nn.Linear(8, 4) | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( | ||
| "module_cls", | ||
| [_FakeSparseMoeBlock, _FakeMoeLayer, _FakeArcticMoe], | ||
| ) | ||
| def test_is_moe_name_based(module_cls): | ||
| assert is_moe(module_cls()) | ||
|
|
||
|
|
||
| def test_is_moe_structural(): | ||
| assert is_moe(_StructuralMoeModule()) | ||
|
|
||
|
|
||
| def test_is_moe_negative(): | ||
| assert not is_moe(_NotMoeModule()) | ||
|
|
||
|
|
||
| def test_is_moe_partial_structural(): | ||
| assert not is_moe(_PartialStructuralModule()) | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # get_expert_linear_names tests | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
|
|
||
| class _FakeGemma4TextDecoderLayer(nn.Module): | ||
| pass | ||
|
|
||
|
|
||
| class _FakeMixtralSparseMoeBlock(nn.Module): | ||
| pass | ||
|
|
||
|
|
||
| class _FakeNemotronHMOE(nn.Module): | ||
| pass | ||
|
|
||
|
|
||
| def test_get_expert_linear_names_gemma4(): | ||
| assert get_expert_linear_names(_FakeGemma4TextDecoderLayer()) == [ | ||
| "gate_proj", | ||
| "down_proj", | ||
| "up_proj", | ||
| ] | ||
|
|
||
|
|
||
| def test_get_expert_linear_names_mixtral(): | ||
| assert get_expert_linear_names(_FakeMixtralSparseMoeBlock()) == ["w1", "w2", "w3"] | ||
|
|
||
|
|
||
| def test_get_expert_linear_names_nemotron(): | ||
| assert get_expert_linear_names(_FakeNemotronHMOE()) == ["up_proj", "down_proj"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.