Skip to content

Fix RoutingMethodType.from_topk softmax+renormalize mapping#33792#33836

Closed
baonudesifeizhai wants to merge 7 commits intovllm-project:mainfrom
baonudesifeizhai:routing_method_typefixed
Closed

Fix RoutingMethodType.from_topk softmax+renormalize mapping#33792#33836
baonudesifeizhai wants to merge 7 commits intovllm-project:mainfrom
baonudesifeizhai:routing_method_typefixed

Conversation

@baonudesifeizhai
Copy link
Copy Markdown
Contributor

@baonudesifeizhai baonudesifeizhai commented Feb 4, 2026

Purpose

#33792
Align from_topk with routing semantics: softmax + renormalize=True now maps to RoutingMethodType.Renormalize (not RenormalizeNaive).
Add a small unit test to cover from_topk mapping and invalid scoring functions.

Test Plan

Test Result

``python -m pytest tests/model_executor/test_routed_experts_capture.py -v passed

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the routing method mapping for softmax + renormalize by introducing a centralized from_topk static method. The refactoring to use this new method across different routers and quantization layers is a good improvement, and the addition of unit tests is appreciated. I have one suggestion to make the logic in the new from_topk method more robust against unsupported top_k values.

Comment on lines +132 to +142
if scoring_func == "sigmoid":
return (
RoutingMethodType.Llama4 if top_k == 1 else RoutingMethodType.DeepSeekV3
)
if scoring_func == "softmax":
return (
RoutingMethodType.Renormalize
if renormalize
else RoutingMethodType.Default
)
raise ValueError(f"Unsupported scoring function: {scoring_func}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic for sigmoid is too broad. It implicitly maps any top_k value other than 1 to RoutingMethodType.DeepSeekV3. This includes potentially invalid values like 0 or negative numbers, and top_k > 2 which may not be correct for DeepSeekV3. The tests only cover top_k=1 and top_k=2. It would be more robust to explicitly check for supported top_k values and raise an error for unsupported ones.

Suggested change
if scoring_func == "sigmoid":
return (
RoutingMethodType.Llama4 if top_k == 1 else RoutingMethodType.DeepSeekV3
)
if scoring_func == "softmax":
return (
RoutingMethodType.Renormalize
if renormalize
else RoutingMethodType.Default
)
raise ValueError(f"Unsupported scoring function: {scoring_func}")
if scoring_func == "sigmoid":
if top_k == 1:
return RoutingMethodType.Llama4
if top_k == 2:
return RoutingMethodType.DeepSeekV3
elif scoring_func == "softmax":
return (
RoutingMethodType.Renormalize
if renormalize
else RoutingMethodType.Default
)
raise ValueError(
f"Unsupported scoring function '{scoring_func}' or top_k '{top_k}'")

@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 5, 2026

Do you have some model evals you could run to check that the changes are expressed and correct for the flashinfer kernels?

@baonudesifeizhai
Copy link
Copy Markdown
Contributor Author

cat <<'PY' > /tmp/e2e_check_routing_method.py
import os
import sys

 
os.environ.setdefault("VLLM_ALLOW_INSECURE_SERIALIZATION", "1")

from vllm import LLM, SamplingParams
from vllm.model_executor.layers.fused_moe import FusedMoE

def collect_routing_methods(model):
    rows = []
    for name, mod in model.named_modules():
        if isinstance(mod, FusedMoE):
            rows.append({
                "name": name,
                "scoring_func": getattr(mod, "scoring_func", None),
                "renormalize": getattr(mod, "renormalize", None),
                "routing_method_type": str(getattr(mod, "routing_method_type", None)),
            })
    return rows

def main():
    model = sys.argv[1] if len(sys.argv) > 1 else "mistralai/Mixtral-8x7B-Instruct-v0.1"
    tp = int(os.environ.get("TP_SIZE", "1"))

    llm = LLM(
        model=model,
        tensor_parallel_size=tp,
        dtype="bfloat16",
        max_model_len=4096,
    )


    llm.generate(["Hello"], SamplingParams(max_tokens=1))

    results = llm.llm_engine.model_executor.collective_rpc(
        "apply_model",
        args=(collect_routing_methods,),
        kwargs={},
    )

    for rank, rows in enumerate(results):
        print(f"rank {rank}: {len(rows)} moe layers")
        for row in rows:
            print(
                f"{row['name']} scoring_func={row['scoring_func']} "
                f"renormalize={row['renormalize']} routing_method_type={row['routing_method_type']}"
            )

if __name__ == "__main__":
    main()
PY
 export CUDA_VISIBLE_DEVICES=0
python /tmp/e2e_check_routing_method.py mistralai/Mixtral-8x7B-Instruct-v0.1

this branch model.layers.31.block_sparse_moe.experts scoring_func=softmax renormalize=True routing_method_type=1
main branch : model.layers.31.block_sparse_moe.experts scoring_func=softmax renormalize=True routing_method_type=4

for lmeval .. c

Do you have some model evals you could run to check that the changes are expressed and correct for the flashinfer kernels?

@dbari
Copy link
Copy Markdown
Contributor

dbari commented Feb 5, 2026

For which model are you trying to enable the Flashinfer TRTLLM kernels?

@baonudesifeizhai
Copy link
Copy Markdown
Contributor Author

mistralai/Mixtral-8x7B-Instruct-v0.1

For which model are you trying to enable the Flashinfer TRTLLM kernels?

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
@baonudesifeizhai
Copy link
Copy Markdown
Contributor Author

baonudesifeizhai commented Feb 6, 2026

@dbari #33919

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants