[CPU] Add native support for Qwen3-next by blzheng · Pull Request #12305 · sgl-project/sglang

blzheng · 2025-10-29T01:34:54Z

Motivation

This pr adds native support for Qwen3-next on CPU.

Modifications

add CPU native implementations for the following operations:
a. causal_conv1d_fn
b. causal_conv1d_update
c. chunk_gated_delta_rule
d. fused_sigmoid_gating_delta_rule_update
e. fused_gdn_gating
f. Qwen3NextRMSNormGated
fix issues in amx backend:
a. Weight packing dtype check: weight packing did not support torch.float. This pr adds dtype validation before packing weight
b. HybridLinearKVPool layer ID handling: Only full attention layers can access get_value_buffer, but layer_id = 0 is not always a full attention layer. This PR updates the logic to handle such cases correctly.
c. Top-k kernel support: Top-k related kernels lacked support for num_experts = 512. This PR adds support for this configuration.

Accuracy Tests

Accuracy on GSM8k:
command line: SGLANG_USE_CPU_ENGINE=1 python -m sglang.launch_server --model Qwen/Qwen3-Next-80B-A3B-Instruct --trust-remote-code --device cpu --tp 4 --dtype bfloat16 --mem-fraction-static 0.8 --max-total-tokens 65536 --disable-overlap-schedule
Accuracy: 0.942
Invalid: 0.000
Latency: 3855.785 s
Output throughput: 42.622 token/s

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

mingfeima · 2025-11-12T02:14:48Z

python/sglang/srt/layers/attention/mamba/causal_conv1d.py

+    start_q = 0
+    for i in range(batch_size):
+        end_q = query_start_loc[i + 1]
+        x_i, final_states = causal_conv1d_ref(


let's wait until sgl-kernel is merged and then replace all the ref with real kernels.

yizhang2077 · 2026-01-20T12:24:07Z

python/sglang/srt/layers/layernorm.py

+        self.variance_epsilon = eps
+
+    def forward(self, hidden_states, gate=None):
+        input_dtype = hidden_states.dtype


can we directly use forward_cpu in RMSNorm?

yizhang2077 · 2026-01-20T12:24:28Z

sgl-kernel/csrc/cpu/topk.cpp

@@ -525,6 +528,9 @@ topk_softmax_cpu(at::Tensor& hidden_states, at::Tensor& gating_output, int64_t t
      case 256:
        LAUNCH_TOPK_SOFTMAX_KERNEL(256);
        break;
+      case 512:
+        LAUNCH_TOPK_SOFTMAX_KERNEL(512);


we need split it into another pr

blzheng · 2026-01-23T01:32:02Z

@yizhang2077 Thanks for the review. This PR has been retired because the necessary changes are already included in #12525. Closing this PR.

blzheng added 6 commits October 26, 2025 23:25

init

6ba6e49

align api

8c2ffda

minor fix

eee0bb5

minor fix

c7fc560

minor fix

590c841

Merge branch 'main' into beilei/qwen3_next_native

36c492f

github-actions bot added the sgl-kernel label Nov 10, 2025

mingfeima reviewed Nov 12, 2025

View reviewed changes

jianan-gu mentioned this pull request Dec 5, 2025

[CPU] Optimize Qwen3-next model on CPU #12525

Merged

yizhang2077 reviewed Jan 20, 2026

View reviewed changes

blzheng closed this Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Add native support for Qwen3-next#12305

[CPU] Add native support for Qwen3-next#12305
blzheng wants to merge 6 commits intosgl-project:mainfrom
blzheng:beilei/qwen3_next_native

blzheng commented Oct 29, 2025

Uh oh!

mingfeima Nov 12, 2025

Uh oh!

yizhang2077 Jan 20, 2026

Uh oh!

yizhang2077 Jan 20, 2026

Uh oh!

blzheng commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

blzheng commented Oct 29, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

mingfeima Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

yizhang2077 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

yizhang2077 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

blzheng commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants