Skip to content

[CPU] Optimize Qwen3-next model on CPU#12525

Merged
Kangyan-Zhou merged 50 commits intosgl-project:mainfrom
jianan-gu:qwen-next-cpu-frontend
Jan 30, 2026
Merged

[CPU] Optimize Qwen3-next model on CPU#12525
Kangyan-Zhou merged 50 commits intosgl-project:mainfrom
jianan-gu:qwen-next-cpu-frontend

Conversation

@jianan-gu
Copy link
Contributor

@jianan-gu jianan-gu commented Nov 3, 2025

This PR adds unified CPU optimizations for Qwen3-next models, including:

  1. Add CPU paths to call optimized kernels, which is depending on below sgl-kernels:
    a. chunk_gated_delta_rule [CPU] Support chunk_gated_delta_rule kernel for Qwen3-Next #12441
    b. fused_sigmoid_gating_delta_rule_update and fused_gdn_gating [CPU] add mamba fla kernels for Qwen3-next #12324
    c. fused_qkvzba_split_reshape_cat [CPU] add fused_qkvzba_split_reshape_cat kernel for Qwen3-next #12330
    d. Conv1d (fn/update) [CPU] add support for mamba causal conv1d for qwen3-next #12309
    e. rmsnorm Add fused_rmsnorm_gated_cpu kernel for CPU to support Qwen3-Next #11577

  2. Fix TP odd size padding issue (like TP3/6), including padding for: (1) conv1d weight (2) linear attention QK and V num heads. (3) dt_bias and A_log (4) shared_expert_intermediate_size

  3. fix issues in amx backend (port from [CPU] Add native support for Qwen3-next #12305):
    a. Weight packing dtype check: weight packing did not support torch.float. This pr adds dtype validation before packing weight
    b. HybridLinearKVPool layer ID handling: Only full attention layers can access get_value_buffer, but layer_id = 0 is not always a full attention layer. This PR updates the logic to handle such cases correctly.
    c. Top-k kernel support: Top-k related kernels lacked support for num_experts = 512. This PR adds support for this configuration.

@mingfeima
Copy link
Collaborator

mingfeima commented Nov 10, 2025

  • merge [Draft] [CPU] Add TP padding for qwen3-next on CPU #12445 into this one
  • change this authors of this PR: Beilei if so (we still need to track our contributions in open source, this is individual marks)
  • put details in the PR comments: what kind of changes that we make, for example: a) adopt qwen3 optimizations; b) fix TP; and so on.

@mingfeima mingfeima added cpu cpu backend performance optimization intel labels Nov 10, 2025
@mingfeima
Copy link
Collaborator

@jianan-gu rebase.

@jianan-gu jianan-gu changed the title [CPU][Draft] Add Qwen3-next CPU optimized frontend [CPU]Add Qwen3-next CPU optimized frontend Nov 18, 2025
@jianan-gu jianan-gu requested a review from zhyncs as a code owner December 5, 2025 07:30
Copy link
Collaborator

@yizhang2077 yizhang2077 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as ci is passed and tiny suggestions are resolved, it can be merged

@yizhang2077
Copy link
Collaborator

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

Checked Xeon/XPU CI failures are not related to this PR and due to known issue on main branch (link: #17460)

@jianan-gu
Copy link
Contributor Author

Checked CI failures are not related to this PR changes.

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

1 similar comment
@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

6 similar comments
@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@jianan-gu
Copy link
Contributor Author

/rerun-failed-ci

@Kangyan-Zhou Kangyan-Zhou merged commit 336dc45 into sgl-project:main Jan 30, 2026
25 of 40 checks passed
charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Jan 30, 2026
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Co-authored-by: Ma Mingfei <mingfei.ma@intel.com>
Co-authored-by: Fan Yin <1106310035@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu cpu backend performance optimization intel run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants