Changes to support latent MoEs by deepakn94 · Pull Request #2296 · NVIDIA/Megatron-LM

deepakn94 · 2025-11-19T02:16:35Z

No description provided.

yanring · 2025-11-24T05:04:42Z

Thanks for the work. Could you also add UT and integration tests covering this feature combined with EP/TP?

yanring

Overall LGTM, left few comments

yanring · 2025-11-24T05:06:01Z

megatron/core/transformer/transformer_config.py

    16 SMs can generally achieve good bandwidth."""

+    moe_latent_size: Optional[int] = None
+    """Latent projection dimension for MoE. If None, MoE latent projections are not used."""


Could you elaborate on it a bit here?

Yeah, do you have reference for latent MoEs?

yanring · 2025-11-24T05:08:24Z

megatron/core/transformer/moe/moe_layer.py

            dispatched_input, probs = self.dispatch(hidden_states, probs)
            output, mlp_bias = self.routed_experts_compute(dispatched_input, probs, residual)
+
+            if self.config.moe_latent_size and mlp_bias is not None:


Please document the change here to explain the bias handling update.

yanring · 2025-11-24T05:19:06Z

megatron/core/transformer/moe/moe_layer.py


+        # Initialize latent projections
+        if self.config.moe_latent_size:
+            assert HAVE_TE


Perhaps assert HAVE_TE, "TransformerEngine is required for MoE latent projections."

venmugil · 2025-11-24T17:52:10Z

megatron/core/transformer/moe/moe_layer.py

+                config=self.config,
+                init_method=self.config.output_layer_init_method,
+                bias=self.config.add_bias_linear,
+                skip_bias_add=True,


Perhaps we could set skip_bias_add=False so that any necessary bias addition is handled internally by TELinear.

yfw · 2025-12-05T18:03:55Z

megatron/core/transformer/moe/moe_layer.py

+            # Project the hidden_states from hidden dimension down to latent dimenion.
+            if self.config.moe_latent_size:
+                assert (
+                    not self.shared_expert_overlap
+                ), "Shared expert overlap not supported when MoE latent projections are used."
+                hidden_states, _ = self.fc1_latent_proj(hidden_states)


I believe this projection needs to happen before we call self.token_dispatcher.dispatch_preprocess (in router_and_preprocess). Otherwise the hidden_shape in the token dispatcher gets set to the original hidden size instead of the latent size. This will result in a shape error when trying to apply fc2_latent_proj

Fixed, thanks.

…he MoE routing happens in hidden dimension and correct tensor shape is captured in token dispatcher

copy-pr-bot · 2025-12-05T18:54:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

deepakn94 · 2025-12-05T21:40:50Z

/ok to test dd645a8

Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>

deepakn94 · 2025-12-06T23:43:17Z

/ok to test 80f1d5f

Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>

deepakn94 · 2025-12-06T23:47:25Z

/ok to test 4257d6a

deepakn94 · 2025-12-07T00:03:30Z

/ok to test 44b708d

ericharper · 2025-12-08T03:39:33Z

/ok to test ec58811

deepakn94 · 2025-12-08T07:10:50Z

/ok to test ec58811

deepakn94 requested review from a team as code owners November 19, 2025 02:16

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 02:16 Inactive

deepakn94 added this to the Core 0.16 milestone Nov 19, 2025

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 02:16 Inactive

deepakn94 self-assigned this Nov 19, 2025

copy-pr-bot bot had a problem deploying to nemo-ci November 19, 2025 02:16 Failure

copy-pr-bot bot temporarily deployed to test November 19, 2025 02:17 Inactive

copy-pr-bot bot temporarily deployed to public November 19, 2025 02:20 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 02:24 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 19, 2025 02:25 Error

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 02:25 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci November 19, 2025 02:25 Error

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 02:25 Inactive

deepakn94 force-pushed the dnarayanan/latent_moe branch from d088236 to b0a2d8c Compare November 19, 2025 02:32

copy-pr-bot bot temporarily deployed to nemo-ci November 19, 2025 02:32 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci November 22, 2025 00:40 Inactive

yanring reviewed Nov 24, 2025

View reviewed changes

venmugil reviewed Nov 24, 2025

View reviewed changes

yfw reviewed Dec 5, 2025

View reviewed changes

Move fc1_latent_proj into router_and_preprocess method to make sure t…

839624a

…he MoE routing happens in hidden dimension and correct tensor shape is captured in token dispatcher

venmugil approved these changes Dec 5, 2025

View reviewed changes

Merge branch 'main' into dnarayanan/latent_moe

dd645a8

Fix to route around backwards compatibility check

80f1d5f

Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>

Minor comment fixes

4257d6a

Signed-off-by: Deepak Narayanan <dnarayanan@nvidia.com>

Move attribute to right place in config

44b708d

ericharper approved these changes Dec 8, 2025

View reviewed changes

Merge branch 'main' into dnarayanan/latent_moe

ec58811

yanring approved these changes Dec 8, 2025

View reviewed changes

ananthsub mentioned this pull request Jan 29, 2026

[sync] Update FLOPs calculation for hybrid MoE and latent MoE NVIDIA-NeMo/Megatron-Bridge#2120

Merged

5 tasks

Conversation

deepakn94 commented Nov 19, 2025

Uh oh!

yanring commented Nov 24, 2025

Uh oh!

yanring left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot bot commented Dec 5, 2025

Uh oh!

deepakn94 commented Dec 5, 2025

Uh oh!

deepakn94 commented Dec 6, 2025

Uh oh!

deepakn94 commented Dec 6, 2025

Uh oh!

deepakn94 commented Dec 7, 2025

Uh oh!

ericharper commented Dec 8, 2025

Uh oh!

deepakn94 commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants