Fix RoutingMethodType.from_topk softmax+renormalize mapping#33792 by baonudesifeizhai · Pull Request #33836 · vllm-project/vllm

baonudesifeizhai · 2026-02-04T22:16:15Z

Purpose

#33792
Align from_topk with routing semantics: softmax + renormalize=True now maps to RoutingMethodType.Renormalize (not RenormalizeNaive).
Add a small unit test to cover from_topk mapping and invalid scoring functions.

Test Plan

Test Result

``python -m pytest tests/model_executor/test_routed_experts_capture.py -v `passed`

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

gemini-code-assist

Code Review

This pull request correctly fixes a bug in the routing method mapping for softmax + renormalize by introducing a centralized from_topk static method. The refactoring to use this new method across different routers and quantization layers is a good improvement, and the addition of unit tests is appreciated. I have one suggestion to make the logic in the new from_topk method more robust against unsupported top_k values.

gemini-code-assist · 2026-02-04T22:18:05Z

vllm/model_executor/layers/fused_moe/config.py

+        if scoring_func == "sigmoid":
+            return (
+                RoutingMethodType.Llama4 if top_k == 1 else RoutingMethodType.DeepSeekV3
+            )
+        if scoring_func == "softmax":
+            return (
+                RoutingMethodType.Renormalize
+                if renormalize
+                else RoutingMethodType.Default
+            )
+        raise ValueError(f"Unsupported scoring function: {scoring_func}")


The logic for sigmoid is too broad. It implicitly maps any top_k value other than 1 to RoutingMethodType.DeepSeekV3. This includes potentially invalid values like 0 or negative numbers, and top_k > 2 which may not be correct for DeepSeekV3. The tests only cover top_k=1 and top_k=2. It would be more robust to explicitly check for supported top_k values and raise an error for unsupported ones.

Suggested change

if scoring_func == "sigmoid":

return (

RoutingMethodType.Llama4 if top_k == 1 else RoutingMethodType.DeepSeekV3

)

if scoring_func == "softmax":

return (

RoutingMethodType.Renormalize

if renormalize

else RoutingMethodType.Default

)

raise ValueError(f"Unsupported scoring function: {scoring_func}")

if scoring_func == "sigmoid":

if top_k == 1:

return RoutingMethodType.Llama4

if top_k == 2:

return RoutingMethodType.DeepSeekV3

elif scoring_func == "softmax":

return (

RoutingMethodType.Renormalize

if renormalize

else RoutingMethodType.Default

)

raise ValueError(

f"Unsupported scoring function '{scoring_func}' or top_k '{top_k}'")

mgoin · 2026-02-05T00:34:28Z

Do you have some model evals you could run to check that the changes are expressed and correct for the flashinfer kernels?

baonudesifeizhai · 2026-02-05T09:05:10Z

cat <<'PY' > /tmp/e2e_check_routing_method.py
import os
import sys

 
os.environ.setdefault("VLLM_ALLOW_INSECURE_SERIALIZATION", "1")

from vllm import LLM, SamplingParams
from vllm.model_executor.layers.fused_moe import FusedMoE

def collect_routing_methods(model):
    rows = []
    for name, mod in model.named_modules():
        if isinstance(mod, FusedMoE):
            rows.append({
                "name": name,
                "scoring_func": getattr(mod, "scoring_func", None),
                "renormalize": getattr(mod, "renormalize", None),
                "routing_method_type": str(getattr(mod, "routing_method_type", None)),
            })
    return rows

def main():
    model = sys.argv[1] if len(sys.argv) > 1 else "mistralai/Mixtral-8x7B-Instruct-v0.1"
    tp = int(os.environ.get("TP_SIZE", "1"))

    llm = LLM(
        model=model,
        tensor_parallel_size=tp,
        dtype="bfloat16",
        max_model_len=4096,
    )


    llm.generate(["Hello"], SamplingParams(max_tokens=1))

    results = llm.llm_engine.model_executor.collective_rpc(
        "apply_model",
        args=(collect_routing_methods,),
        kwargs={},
    )

    for rank, rows in enumerate(results):
        print(f"rank {rank}: {len(rows)} moe layers")
        for row in rows:
            print(
                f"{row['name']} scoring_func={row['scoring_func']} "
                f"renormalize={row['renormalize']} routing_method_type={row['routing_method_type']}"
            )

if __name__ == "__main__":
    main()
PY

 export CUDA_VISIBLE_DEVICES=0
python /tmp/e2e_check_routing_method.py mistralai/Mixtral-8x7B-Instruct-v0.1

this branch model.layers.31.block_sparse_moe.experts scoring_func=softmax renormalize=True routing_method_type=1
main branch : model.layers.31.block_sparse_moe.experts scoring_func=softmax renormalize=True routing_method_type=4

for lmeval .. c

Do you have some model evals you could run to check that the changes are expressed and correct for the flashinfer kernels?

dbari · 2026-02-05T12:37:05Z

For which model are you trying to enable the Flashinfer TRTLLM kernels?

baonudesifeizhai · 2026-02-05T18:01:05Z

mistralai/Mixtral-8x7B-Instruct-v0.1

For which model are you trying to enable the Flashinfer TRTLLM kernels?

vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py

vllm/model_executor/layers/quantization/mxfp4.py

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

baonudesifeizhai · 2026-02-06T19:51:19Z

@dbari #33919

baonudesifeizhai added 6 commits February 4, 2026 14:22

fix

d1a0260

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

change

b9eddaa

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

fix

32ba6ef

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

fix

6348906

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

fix

3b778a0

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

fix

524d295

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

baonudesifeizhai requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners February 4, 2026 22:16

mergify bot added the nvidia label Feb 4, 2026

github-project-automation bot added this to NVIDIA Feb 4, 2026

gemini-code-assist bot reviewed Feb 4, 2026

View reviewed changes

baonudesifeizhai closed this Feb 5, 2026

github-project-automation bot moved this to Done in NVIDIA Feb 5, 2026

baonudesifeizhai reopened this Feb 5, 2026

dbari reviewed Feb 6, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py Outdated Show resolved Hide resolved

dbari reviewed Feb 6, 2026

View reviewed changes

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py Outdated Show resolved Hide resolved

dbari reviewed Feb 6, 2026

View reviewed changes

vllm/model_executor/layers/quantization/mxfp4.py Outdated Show resolved Hide resolved

fix

589de3f

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

baonudesifeizhai closed this Feb 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix RoutingMethodType.from_topk softmax+renormalize mapping#33792#33836

Fix RoutingMethodType.from_topk softmax+renormalize mapping#33792#33836
baonudesifeizhai wants to merge 7 commits intovllm-project:mainfrom
baonudesifeizhai:routing_method_typefixed

baonudesifeizhai commented Feb 4, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 4, 2026

Uh oh!

mgoin commented Feb 5, 2026

Uh oh!

baonudesifeizhai commented Feb 5, 2026

Uh oh!

dbari commented Feb 5, 2026

Uh oh!

baonudesifeizhai commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baonudesifeizhai commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

baonudesifeizhai commented Feb 4, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

``python -m pytest tests/model_executor/test_routed_experts_capture.py -v passed

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

mgoin commented Feb 5, 2026

Uh oh!

baonudesifeizhai commented Feb 5, 2026

Uh oh!

dbari commented Feb 5, 2026

Uh oh!

baonudesifeizhai commented Feb 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

baonudesifeizhai commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baonudesifeizhai commented Feb 4, 2026 •

edited by github-actions bot

Loading

``python -m pytest tests/model_executor/test_routed_experts_capture.py -v `passed`

baonudesifeizhai commented Feb 6, 2026 •

edited

Loading