-
Notifications
You must be signed in to change notification settings - Fork 11
Enable Model Config Input via a Centralized Parser in utils.py #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Model Config Input via a Centralized Parser in utils.py #13
Conversation
0bba689 to
6029859
Compare
Top-K Softmax Performance BenchmarkCase 1python bench_moe_topk_softmax.py --model-name Qwen/Qwen3-VL-8B-Instruct --top-k 2📡 Loading config from Hugging Face: Qwen/Qwen3-VL-8B-Instruct
`torch_dtype` is deprecated! Use `dtype` instead!
⚙️ Overriding top_k = [2] (from CLI)
sweep_params {'num_tokens': [100], 'num_experts': [64], 'top_k': [2], 'dtype': [torch.bfloat16], 'renormalize': [False]}
Testing 1 configurations...
Config: num_tokens=100, num_experts=64, topk=2, dtype=torch.bfloat16, renormalize=False
Starting performance benchmark...
topk-softmax-performance:
num_tokens num_experts topk dtype renormalize SGLang native
0 100 64 2 torch.bfloat16 False 15.16 86.32Case 2 python bench_moe_topk_softmax.pypython bench_moe_topk_softmax.py 🔧 No --model-name provided. Using CLI args or defaults.
💡 Using default num_experts = 64
💡 Using default top_k = 2
💡 Using default num_layers = 32
💡 Using default hidden_size = 4096
💡 Using default ffn_hidden_size = 11008
💡 Using default num_heads = 32
💡 Using default num_kv_heads = 8
💡 Using default head_dim = 128
💡 Using default vocab_size = 32000
💡 Using default max_seq_len = 32768
💡 Using default norm_eps = 1e-06
💡 Using default architectures = ['LlamaForCausalLM']
💡 Using default dtype = float16
sweep_params {'num_tokens': [100], 'num_experts': [64], 'top_k': [2], 'dtype': [torch.bfloat16], 'renormalize': [False]}
Testing 1 configurations...
Config: num_tokens=100, num_experts=64, topk=2, dtype=torch.bfloat16, renormalize=False
Starting performance benchmark...
topk-softmax-performance:
num_tokens num_experts topk dtype renormalize SGLang native
0 100 64 2 torch.bfloat16 False 15.2 86.8 |
benchmark/bench_moe_topk_softmax.py
Outdated
| return topk_weights, topk_indices | ||
|
|
||
|
|
||
| def navtive_topk_softmax(gating_output, topk): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This native function does not support renormalize. You may follow this one in SGLang:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so bench_moe_topk_softmax.py not needed in
| /bin/bash -c "cd /root/sglang/sgl-kernel-xpu/benchmark && python3 bench_flash_attn.py " |
Tested in CI? For performance test? But We dont have performance check in benchmark_kernel scripts, like criteria, scenarios, fail/pass conditions. I suppose functionality test has already been enabled in UT cases, no need to track with kernel bench |
|
better to add bench_moe_topk_softmax.py first even we haven't tracked it |
This PR proposes to enhance the model configuration system by introducing a flexible and reusable mechanism to load model parameters — either from a Hugging Face model config (--model-name) or via manual CLI overrides — instead of relying on hardcoded values. The implementation will be centralized in a new utils.py module to enable consistent configuration handling across multiple benchmark and inference scripts.
This change enables users to:
✅ Enable native pytorch op path for topk-softmax
✅ Benchmark models using real-world configurations (e.g., deepseek-ai/DeepSeek-R1)
✅ Override specific parameters (e.g., --num-experts 128)
✅ Avoid hardcoded test configurations
✅ Reuse config logic across tools