[Performance] qwen3-next improve causal conv1d in prefill phase#10595
Merged
zhyncs merged 4 commits intosgl-project:mainfrom Sep 18, 2025
Merged
[Performance] qwen3-next improve causal conv1d in prefill phase#10595zhyncs merged 4 commits intosgl-project:mainfrom
zhyncs merged 4 commits intosgl-project:mainfrom
Conversation
zhyncs
approved these changes
Sep 18, 2025
chenxu140
added a commit
to ping1jing2/sglang
that referenced
this pull request
Sep 20, 2025
* origin/qwen3: (30 commits) chore: bump sgl-kernel 0.3.11 (sgl-project#10630) feat: add fused moe config for Qwen3-Next-80B-A3B-Instruct on B200 (sgl-project#10631) model support: Sarashina2VisionForCausalLM (sgl-project#10632) [Performance] Qwen3-Next: speed up update_mamba_state_after_mtp_verify by 10x; e2e up to 3.54% faster (sgl-project#10586) [Performance] Qwen3-Next: replace arange to cached query_start_loc_li… (sgl-project#10553) [Feature] Speculative decoding support lookahead (sgl-project#9873) refactor: use registry for _get_attention_backend_from_str (sgl-project#10629) [router] refactor worker to builder pattern 1/n (sgl-project#10628) Garbage collector regression in the online server (sgl-project#10621) feat: Add FlexAttention Backend for Efficient Sparse Attention (sgl-project#9947) Fix bias handling in TritonMoeQuantInfo within quantization/mxfp4.py (sgl-project#10579) [Performance] qwen3-next improve causal conv1d in prefill phase (sgl-project#10595) Fix sgl_kernel import failure on devices other than CUDA (sgl-project#10610) support qwen3-next-fp8 deepep (sgl-project#10622) update deepep version for qwen3-next deepep moe (sgl-project#10624) Feat/add heartbeat mechanism for nixl conn (sgl-project#10222) [RL] Add destroy process group api (sgl-project#9979) fix deepep assert when PD disaggregation == null (sgl-project#8274) Scale kkt after reduction (sgl-project#10604) [improvement] add average input/output token length for hicache benchmark stats output (sgl-project#10525) ...
lifuhuang
pushed a commit
that referenced
this pull request
Sep 20, 2025
Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
HanHan009527
pushed a commit
to HanHan009527/sglang
that referenced
this pull request
Oct 9, 2025
…project#10595) Co-authored-by: Jinyan Chen <jinyanc@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
If not target_verify, call causal_conv1d_fwd in sgl_kernel for prefill to eliminate the bubbles; If target_verify, then still call triton causal_conv1d_update for spec decoding.
Modifications
Accuracy Tests
Benchmarking and Profiling (TP=4)
prefill before

prefill after

Checklist