[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 by vadiklyutiy · Pull Request #33257 · vllm-project/vllm

vadiklyutiy · 2026-01-28T13:24:57Z

Summary

Enable tensor parallelism (TP > 1) for quantized hybrid Mamba models (e.g., Falcon-H1R-7B with FP8) when n_groups=1.

Root Cause

Custom weight loaders for group replication were only implemented for non-quantized layers. This PR extends support to quantized layers by leveraging the weight_loader property on ModelWeightParameter (extends BasevLLMParameter).

Test

vllm serve tiiuae/Falcon-H1R-7B -tp 2 --quantization fp8

Previously failed, now works

Validation (lm_eval)

lm_eval --model local-chat-completions \
  --model_args model=tiiuae/Falcon-H1R-7B,base_url=http://localhost:8000/v1/chat/completions,num_concurrent=250 \
  --tasks gsm8k --apply_chat_template --num_fewshot 5

Results with TP=2 (this PR):

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.3283	±	0.0129
		strict-match	5	exact_match	↑	0.0963	±	0.0081

Results with TP=1 (baseline):

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.3154	±	0.0128
		strict-match	5	exact_match	↑	0.1031	±	0.0084

Results are consistent within statistical error margins, confirming correctness.

Related Issues

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

gemini-code-assist

Code Review

This pull request enables tensor parallelism for quantized Mamba models with n_groups=1 by extending custom weight loader support to quantized layers. The changes are well-structured and address the issue effectively. The removal of the restrictive assertion and the new logic for applying the mamba_v2_sharded_weight_loader to both quantized and non-quantized weights are correct. The implementation correctly distinguishes between BasevLLMParameter subclasses and standard torch.nn.Parameter to set the weight loader. The changes look good and improve support for quantized Mamba models.

vadiklyutiy · 2026-01-28T13:27:36Z

cc @tomeras91

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

vllm/model_executor/layers/mamba/mamba_mixer2.py

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

tomeras91

Nice!

vllm/model_executor/layers/mamba/mamba_mixer2.py

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy · 2026-02-01T21:45:06Z

want to cover with wider testing. Add ready for that

vadiklyutiy · 2026-02-02T01:26:52Z

@tomeras91 @tdoublep
CI passed, review comment fixed, failed Falcon-H1R-7B is fixed
Could take a look?

tomeras91 · 2026-02-02T07:36:17Z

LGTM
As a final validation, can you double check that NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 still works and produces similar GSM8K results on main and on this PR?

vadiklyutiy · 2026-02-02T13:04:18Z

LGTM As a final validation, can you double check that NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 still works and produces similar GSM8K results on main and on this PR?

For NVIDIA-Nemotron-3-Nano-30B-A3B-FP8

Baseline

| Tasks | Version | Filter           | n-shot | Metric      |   | Value  |   | Stderr |
|-------|--------:|------------------|-------:|-------------|---|-------:|---|-------:|
| gsm8k |       3 | flexible-extract |      5 | exact_match | ↑ | 0.3169 | ± | 0.0128 |
|       |         | strict-match     |      5 | exact_match | ↑ | 0.4792 | ± | 0.0138 |

With PR

| Tasks | Version | Filter           | n-shot | Metric      |   | Value  |   | Stderr |
|-------|--------:|------------------|-------:|-------------|---|-------:|---|-------:|
| gsm8k |       3 | flexible-extract |      5 | exact_match | ↑ | 0.3017 | ± | 0.0126 |
|       |         | strict-match     |      5 | exact_match | ↑ | 0.4693 | ± | 0.0137 |

vllm/model_executor/layers/mamba/mamba_mixer2.py

…s=1 (vllm-project#33257) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

… n_groups=1 (vllm-project#33257)" This reverts commit a372f3f.

… n_groups=1 (vllm-project#33257)" This reverts commit a372f3f. Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>

…s=1 (vllm-project#33257) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

fix tp>1 for quntized mamba models

8874373

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy requested a review from tdoublep as a code owner January 28, 2026 13:24

vadiklyutiy requested a review from tlrmchlsmth January 28, 2026 13:26

vadiklyutiy self-assigned this Jan 28, 2026

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

cursor bot reviewed Jan 28, 2026

View reviewed changes

vllm/model_executor/layers/mamba/mamba_mixer2.py Outdated Show resolved Hide resolved

fix

b3878ef

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

tomeras91 reviewed Jan 28, 2026

View reviewed changes

vllm/model_executor/layers/mamba/mamba_mixer2.py Outdated Show resolved Hide resolved

vadiklyutiy force-pushed the vadim/fix-falcon-fp8-tp branch 2 times, most recently from 84f3cc8 to b3878ef Compare February 1, 2026 20:52

Unify MambaMixer2 TP sharding to use custom weight loader

d5d6d0b

Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 1, 2026

mgoin added the bug Something isn't working label Feb 3, 2026

tlrmchlsmth reviewed Feb 3, 2026

View reviewed changes

vllm/model_executor/layers/mamba/mamba_mixer2.py Show resolved Hide resolved

tlrmchlsmth approved these changes Feb 3, 2026

View reviewed changes

tlrmchlsmth merged commit a372f3f into vllm-project:main Feb 3, 2026
45 checks passed

ProExpertProg mentioned this pull request Feb 4, 2026

[CI Failure]: Distributed 2xH100 tests #33802

Closed

3 tasks

amitz-nv added a commit to amitz-nv/vllm that referenced this pull request Feb 9, 2026

Revert "[MISC] Fix Tensor Parallelism for Quantized Mamba Models with…

444be9c

… n_groups=1 (vllm-project#33257)" This reverts commit a372f3f.

amitz-nv mentioned this pull request Feb 9, 2026

Revert "[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257)" #34151

Closed

5 tasks

vadiklyutiy mentioned this pull request Feb 13, 2026

[BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 #34476

Merged

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_group…

f0992fb

…s=1 (vllm-project#33257) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>

vadiklyutiy deleted the vadim/fix-falcon-fp8-tp branch March 11, 2026 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1#33257

[MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1#33257
tlrmchlsmth merged 3 commits intovllm-project:mainfrom
CentML:vadim/fix-falcon-fp8-tp

vadiklyutiy commented Jan 28, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

vadiklyutiy commented Jan 28, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

tomeras91 left a comment

Uh oh!

Uh oh!

vadiklyutiy commented Feb 1, 2026

Uh oh!

vadiklyutiy commented Feb 2, 2026

Uh oh!

tomeras91 commented Feb 2, 2026

Uh oh!

vadiklyutiy commented Feb 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

vadiklyutiy commented Jan 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Test

Validation (lm_eval)

Related Issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

vadiklyutiy commented Jan 28, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomeras91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vadiklyutiy commented Feb 1, 2026

Uh oh!

vadiklyutiy commented Feb 2, 2026

Uh oh!

tomeras91 commented Feb 2, 2026

Uh oh!

vadiklyutiy commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vadiklyutiy commented Jan 28, 2026 •

edited by github-actions bot

Loading

vadiklyutiy commented Feb 2, 2026 •

edited

Loading