Skip to content

Conversation

@DiweiSun
Copy link
Collaborator

@DiweiSun DiweiSun commented Sep 24, 2025

This PR proposes to enhance the model configuration system by introducing a flexible and reusable mechanism to load model parameters — either from a Hugging Face model config (--model-name) or via manual CLI overrides — instead of relying on hardcoded values. The implementation will be centralized in a new utils.py module to enable consistent configuration handling across multiple benchmark and inference scripts.

This change enables users to:
✅ Enable native pytorch op path for topk-softmax

✅ Benchmark models using real-world configurations (e.g., deepseek-ai/DeepSeek-R1)
✅ Override specific parameters (e.g., --num-experts 128)
✅ Avoid hardcoded test configurations
✅ Reuse config logic across tools

@DiweiSun DiweiSun force-pushed the kernel_benchmark/topk_softmax branch from 0bba689 to 6029859 Compare September 25, 2025 03:18
@DiweiSun
Copy link
Collaborator Author

DiweiSun commented Sep 25, 2025

Top-K Softmax Performance Benchmark

Case 1

python bench_moe_topk_softmax.py  --model-name Qwen/Qwen3-VL-8B-Instruct --top-k 2
📡 Loading config from Hugging Face: Qwen/Qwen3-VL-8B-Instruct
`torch_dtype` is deprecated! Use `dtype` instead!
⚙️ Overriding top_k = [2] (from CLI)
sweep_params {'num_tokens': [100], 'num_experts': [64], 'top_k': [2], 'dtype': [torch.bfloat16], 'renormalize': [False]}
Testing 1 configurations...
Config: num_tokens=100, num_experts=64, topk=2, dtype=torch.bfloat16, renormalize=False
Starting performance benchmark...
topk-softmax-performance:
   num_tokens  num_experts  topk           dtype  renormalize  SGLang  native
0         100           64     2  torch.bfloat16        False   15.16   86.32

Case 2 python bench_moe_topk_softmax.py

python bench_moe_topk_softmax.py  
🔧 No --model-name provided. Using CLI args or defaults.
💡 Using default num_experts = 64
💡 Using default top_k = 2
💡 Using default num_layers = 32
💡 Using default hidden_size = 4096
💡 Using default ffn_hidden_size = 11008
💡 Using default num_heads = 32
💡 Using default num_kv_heads = 8
💡 Using default head_dim = 128
💡 Using default vocab_size = 32000
💡 Using default max_seq_len = 32768
💡 Using default norm_eps = 1e-06
💡 Using default architectures = ['LlamaForCausalLM']
💡 Using default dtype = float16
sweep_params {'num_tokens': [100], 'num_experts': [64], 'top_k': [2], 'dtype': [torch.bfloat16], 'renormalize': [False]}
Testing 1 configurations...
Config: num_tokens=100, num_experts=64, topk=2, dtype=torch.bfloat16, renormalize=False
Starting performance benchmark...
topk-softmax-performance:
   num_tokens  num_experts  topk           dtype  renormalize  SGLang  native
0         100           64     2  torch.bfloat16        False    15.2    86.8

@DiweiSun DiweiSun marked this pull request as ready for review October 22, 2025 03:26
return topk_weights, topk_indices


def navtive_topk_softmax(gating_output, topk):
Copy link
Collaborator

@chunyuan-w chunyuan-w Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chunyuan-w chunyuan-w requested a review from airMeng November 5, 2025 07:25
Copy link
Collaborator

@airMeng airMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so bench_moe_topk_softmax.py not needed in

/bin/bash -c "cd /root/sglang/sgl-kernel-xpu/benchmark && python3 bench_flash_attn.py "
?

@DiweiSun
Copy link
Collaborator Author

DiweiSun commented Nov 5, 2025

so bench_moe_topk_softmax.py not needed in

/bin/bash -c "cd /root/sglang/sgl-kernel-xpu/benchmark && python3 bench_flash_attn.py "

?

Tested in CI? For performance test? But We dont have performance check in benchmark_kernel scripts, like criteria, scenarios, fail/pass conditions. I suppose functionality test has already been enabled in UT cases, no need to track with kernel bench

@DiweiSun DiweiSun added the run-ci label Nov 5, 2025
@airMeng
Copy link
Collaborator

airMeng commented Nov 5, 2025

better to add bench_moe_topk_softmax.py first even we haven't tracked it

@DiweiSun DiweiSun merged commit 1605f6b into sgl-project:main Nov 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants