[NemotronH] Use ReplicatedLinear for fc1_latent_proj by roikoren755 · Pull Request #31807 · vllm-project/vllm

roikoren755 · 2026-01-06T12:53:29Z

Purpose

Current implementation of latent MoE uses a ColumnParallelLinear for the fc1_latent_proj layer. After profiling, the synchronization overhead is quite substantial, and given that the layer itself is relatively small, replicating it instead improves performance while not costing too much in terms of VRAM usage.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Roi Koren <roik@nvidia.com>

gemini-code-assist

Code Review

This pull request replaces ColumnParallelLinear with ReplicatedLinear for the fc1_latent_proj layer in the latent MoE implementation. As described, this is a sound performance optimization for a relatively small layer, trading a small increase in VRAM for a reduction in synchronization overhead. The code change is correct and well-contained. I approve this change.

tlrmchlsmth

Anything you can share on the performance difference? Mainly just curious

) Signed-off-by: Roi Koren <roik@nvidia.com>

) Signed-off-by: Roi Koren <roik@nvidia.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

) Signed-off-by: Roi Koren <roik@nvidia.com>

Replicate fc1_latent_proj instead of parallelizing it

0d5bd51

Signed-off-by: Roi Koren <roik@nvidia.com>

gemini-code-assist bot reviewed Jan 6, 2026

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 6, 2026

tlrmchlsmth reviewed Jan 6, 2026

View reviewed changes

tlrmchlsmth approved these changes Jan 6, 2026

View reviewed changes

tlrmchlsmth enabled auto-merge (squash) January 6, 2026 14:08

tlrmchlsmth merged commit 28c9477 into vllm-project:main Jan 6, 2026
56 checks passed

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026

[NemotronH] Use ReplicatedLinear for fc1_latent_proj (vllm-project#31807

05fb361

) Signed-off-by: Roi Koren <roik@nvidia.com>

roikoren755 deleted the feat/latent-in-replicated-linear branch January 6, 2026 19:01

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

[NemotronH] Use ReplicatedLinear for fc1_latent_proj (vllm-project#31807

6808f98

) Signed-off-by: Roi Koren <roik@nvidia.com>

amirkl94 mentioned this pull request Jan 15, 2026

Use replicated linear latent guyueh1/vllm#6

Merged

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[NemotronH] Use ReplicatedLinear for fc1_latent_proj (vllm-project#31807

7152c31

) Signed-off-by: Roi Koren <roik@nvidia.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[NemotronH] Use ReplicatedLinear for fc1_latent_proj (vllm-project#31807

237e802

) Signed-off-by: Roi Koren <roik@nvidia.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[NemotronH] Use ReplicatedLinear for fc1_latent_proj (vllm-project#31807

4f19e05

) Signed-off-by: Roi Koren <roik@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NemotronH] Use ReplicatedLinear for fc1_latent_proj#31807

[NemotronH] Use ReplicatedLinear for fc1_latent_proj#31807
tlrmchlsmth merged 1 commit intovllm-project:mainfrom
roikoren755:feat/latent-in-replicated-linear

roikoren755 commented Jan 6, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

roikoren755 commented Jan 6, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roikoren755 commented Jan 6, 2026 •

edited by github-actions bot

Loading