Enable Model Config Input via a Centralized Parser in utils.py #13

DiweiSun · 2025-09-24T08:57:11Z

This PR proposes to enhance the model configuration system by introducing a flexible and reusable mechanism to load model parameters — either from a Hugging Face model config (--model-name) or via manual CLI overrides — instead of relying on hardcoded values. The implementation will be centralized in a new utils.py module to enable consistent configuration handling across multiple benchmark and inference scripts.

This change enables users to:
✅ Enable native pytorch op path for topk-softmax

✅ Benchmark models using real-world configurations (e.g., deepseek-ai/DeepSeek-R1)
✅ Override specific parameters (e.g., --num-experts 128)
✅ Avoid hardcoded test configurations
✅ Reuse config logic across tools

DiweiSun · 2025-09-25T03:20:45Z

Top-K Softmax Performance Benchmark

Case 1

python bench_moe_topk_softmax.py  --model-name Qwen/Qwen3-VL-8B-Instruct --top-k 2

📡 Loading config from Hugging Face: Qwen/Qwen3-VL-8B-Instruct
`torch_dtype` is deprecated! Use `dtype` instead!
⚙️ Overriding top_k = [2] (from CLI)
sweep_params {'num_tokens': [100], 'num_experts': [64], 'top_k': [2], 'dtype': [torch.bfloat16], 'renormalize': [False]}
Testing 1 configurations...
Config: num_tokens=100, num_experts=64, topk=2, dtype=torch.bfloat16, renormalize=False
Starting performance benchmark...
topk-softmax-performance:
   num_tokens  num_experts  topk           dtype  renormalize  SGLang  native
0         100           64     2  torch.bfloat16        False   15.16   86.32

Case 2 python bench_moe_topk_softmax.py

python bench_moe_topk_softmax.py

🔧 No --model-name provided. Using CLI args or defaults.
💡 Using default num_experts = 64
💡 Using default top_k = 2
💡 Using default num_layers = 32
💡 Using default hidden_size = 4096
💡 Using default ffn_hidden_size = 11008
💡 Using default num_heads = 32
💡 Using default num_kv_heads = 8
💡 Using default head_dim = 128
💡 Using default vocab_size = 32000
💡 Using default max_seq_len = 32768
💡 Using default norm_eps = 1e-06
💡 Using default architectures = ['LlamaForCausalLM']
💡 Using default dtype = float16
sweep_params {'num_tokens': [100], 'num_experts': [64], 'top_k': [2], 'dtype': [torch.bfloat16], 'renormalize': [False]}
Testing 1 configurations...
Config: num_tokens=100, num_experts=64, topk=2, dtype=torch.bfloat16, renormalize=False
Starting performance benchmark...
topk-softmax-performance:
   num_tokens  num_experts  topk           dtype  renormalize  SGLang  native
0         100           64     2  torch.bfloat16        False    15.2    86.8

chunyuan-w · 2025-11-05T07:20:09Z

benchmark/bench_moe_topk_softmax.py

    return topk_weights, topk_indices


+def navtive_topk_softmax(gating_output, topk):


This native function does not support renormalize. You may follow this one in SGLang:

https://github.com/sgl-project/sglang/blob/36942660513213f3ecd4a39ad64d6cf3127328a9/python/sglang/srt/layers/moe/topk.py#L411-L417

airMeng

so bench_moe_topk_softmax.py not needed in

sgl-kernel-xpu/.github/workflows/pr-test-xpu.yml

Line 58 in 688c0b8

    
                       /bin/bash -c "cd /root/sglang/sgl-kernel-xpu/benchmark &&  python3 bench_flash_attn.py "

?

DiweiSun · 2025-11-05T07:53:13Z

so bench_moe_topk_softmax.py not needed in

sgl-kernel-xpu/.github/workflows/pr-test-xpu.yml

Line 58 in 688c0b8

/bin/bash -c "cd /root/sglang/sgl-kernel-xpu/benchmark && python3 bench_flash_attn.py "

?

Tested in CI? For performance test? But We dont have performance check in benchmark_kernel scripts, like criteria, scenarios, fail/pass conditions. I suppose functionality test has already been enabled in UT cases, no need to track with kernel bench

airMeng · 2025-11-05T23:28:34Z

better to add bench_moe_topk_softmax.py first even we haven't tracked it

DiweiSun added 3 commits September 24, 2025 01:30

refine kernel benchmark structure

6c1b110

refine

ec92d0b

format fix

6029859

DiweiSun force-pushed the kernel_benchmark/topk_softmax branch from 0bba689 to 6029859 Compare September 25, 2025 03:18

DiweiSun added 2 commits September 24, 2025 23:59

enable native pytorch op path

8f351c0

fix format

b6623ba

DiweiSun marked this pull request as ready for review October 22, 2025 03:26

chunyuan-w reviewed Nov 5, 2025

View reviewed changes

chunyuan-w requested a review from airMeng November 5, 2025 07:25

airMeng reviewed Nov 5, 2025

View reviewed changes

DiweiSun and others added 2 commits November 5, 2025 15:54

Merge branch 'sgl-project:main' into kernel_benchmark/topk_softmax

ac36f39

refine kernellevel benchmarking for topk

e569f7d

DiweiSun added the run-ci label Nov 5, 2025

DiweiSun added 2 commits November 5, 2025 17:43

Update bench_moe_topk_softmax.py

ea43f57

Update bench_moe_topk_softmax.py

0ac9f93

DiweiSun and others added 2 commits November 6, 2025 10:50

Update pr-test-xpu.yml

c2dbfbe

format fix

405a522

airMeng approved these changes Nov 6, 2025

View reviewed changes

DiweiSun merged commit 1605f6b into sgl-project:main Nov 6, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Model Config Input via a Centralized Parser in utils.py #13

Enable Model Config Input via a Centralized Parser in utils.py #13

Uh oh!

DiweiSun commented Sep 24, 2025 •

edited

Loading

Uh oh!

DiweiSun commented Sep 25, 2025 •

edited

Loading

Uh oh!

chunyuan-w Nov 5, 2025 •

edited

Loading

Uh oh!

airMeng left a comment

Uh oh!

DiweiSun commented Nov 5, 2025

Uh oh!

airMeng commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return topk_weights, topk_indices


		def navtive_topk_softmax(gating_output, topk):

Enable Model Config Input via a Centralized Parser in utils.py #13

Enable Model Config Input via a Centralized Parser in utils.py #13

Uh oh!

Conversation

DiweiSun commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DiweiSun commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Top-K Softmax Performance Benchmark

Case 1

Case 2 python bench_moe_topk_softmax.py

Uh oh!

chunyuan-w Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

airMeng left a comment

Choose a reason for hiding this comment

Uh oh!

DiweiSun commented Nov 5, 2025

Uh oh!

airMeng commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DiweiSun commented Sep 24, 2025 •

edited

Loading

DiweiSun commented Sep 25, 2025 •

edited

Loading

chunyuan-w Nov 5, 2025 •

edited

Loading