[CPU] add support for mamba causal conv1d for qwen3-next by mingfeima · Pull Request #12309 · sgl-project/sglang

mingfeima · 2025-10-29T02:01:30Z

Motivation

add support for mamba causal conv1d for qwen3-next

Modifications

add kernel files at sgl-kernel/csrc/cpu/mamba/conv.cpp, which implemented both causal_conv1d_fwd for prefill and causal_conv1d_update for decode.

APIs align with existing CUDA counterpart
support both batched input and variant length input
CPU kernels will requires x to be contiguous on the second-to-last dimension to achieve optimal performance
shift conv_states to be contiguous the second-to-last dimension to make sure vectorized load from memory
implemented with avx512-bf16, applying amx would be pointless since width is 4 (which conresponds to K in tinygemm of amx, it requires to be 32x)
weight prepacked to be vnni2 format to remove online prepacking overhead

Accuracy Tests

python /test/srt/cpu/test_causal_conv1d.py

Benchmarking and Profiling

compare the optimized C++ version against reference implementation with native torch (which will end up with oneDNN)

performance collected on 6th gen Xeon with 40 cores, with script.

### batch = 1, dim = 8192, seqlen = 1024
### causal_conv1d: oneDNN ref: 3.205 ms; opt: 0.239 ms

### batch = 128, dim = 8192, seqlen = 1024
### causal_conv1d: oneDNN ref: 406.583 ms; opt: 36.051 ms

NOTE: the main reason for low performance with torch native implementation is that PyTorch doesn't have channels last concept for 1d convolution. On PyTorch, 1d convolution will be mapped to 2d and therefore it will always be channels first. This will trigger:

addtional copy (tranpose last 2 dimensions) to make the input contiguous
input and weight will need to be reordered to onednn internal format to use vnni
output will be reordered from internal format to plain format

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-29T02:01:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

mingfeima · 2025-12-03T07:34:39Z

fix new lint error.

…#12309)

…ing)

…#12309)

mingfeima requested review from BBuf, FlamingoPg, HaiShaw, ispobock, merrymercy, yizhang2077 and zhyncs as code owners October 29, 2025 02:01

sglang-bot added the run-ci label Oct 29, 2025

mingfeima marked this pull request as draft October 29, 2025 02:01

mingfeima mentioned this pull request Oct 29, 2025

[CPU] add mamba fla kernels for Qwen3-next #12324

Merged

4 tasks

github-actions bot added documentation Improvements or additions to documentation performance quant LLM Quantization amd dependencies Pull requests that update a dependency file lora router Multi-modal multi-modal language model deepseek speculative-decoding sgl-kernel labels Nov 6, 2025

mingfeima force-pushed the pr_qwen3_next_support branch from 09d256d to fe93cef Compare November 6, 2025 07:26

mingfeima marked this pull request as ready for review November 6, 2025 08:07

mingfeima removed documentation Improvements or additions to documentation quant LLM Quantization amd dependencies Pull requests that update a dependency file lora router labels Nov 6, 2025

mingfeima added intel cpu cpu backend performance optimization and removed speculative-decoding labels Nov 6, 2025

jianan-gu mentioned this pull request Nov 14, 2025

[CPU] Optimize Qwen3-next model on CPU #12525

Merged

mingfeima force-pushed the pr_qwen3_next_support branch 2 times, most recently from 4283bf6 to 7db502b Compare December 3, 2025 07:32

mingfeima force-pushed the pr_qwen3_next_support branch from 7db502b to 0abfa78 Compare December 4, 2025 02:44

mingfeima mentioned this pull request Dec 4, 2025

[Roadmap] Intel CPU Roadmap (2025Q4) #12802

Open

2 tasks

FlamingoPg approved these changes Dec 4, 2025

View reviewed changes

FlamingoPg merged commit f90b400 into sgl-project:main Dec 4, 2025
127 of 131 checks passed

tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

8232c0b

…#12309)

mingfeima added 10 commits December 4, 2025 10:42

add casaul_conv1d support on CPU

401ca72

add support for casaul_conv1d_fn with varlen suport (contiguous batch…

6ff4329

…ing)

add causal_conv1d_update support on CPU

8baa066

add conv indices support for conv1d_update kernel

3fcaed1

add support for varlen kernel with conv_states

b88b571

add test cases for causal conv1d on CPU

913a54e

add test_causal_conv1d.py to run suite

ffe3e3b

change test file order according to alphabetical order

e4a88a5

update format

41cd3ef

fix: keep sgl_kernel import for C++ ops registration

0abfa78

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

31aa0a8

…#12309)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

914d412

…#12309)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

8f21c33

…#12309)

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

a7d1884

…#12309)

Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

a9c3c2a

…#12309)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 12, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

1f953bb

…#12309)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 12, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

47cb661

…#12309)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 12, 2025

[CPU] add support for mamba causal conv1d for qwen3-next (sgl-project…

0e3ea22

…#12309)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] add support for mamba causal conv1d for qwen3-next#12309

[CPU] add support for mamba causal conv1d for qwen3-next#12309
FlamingoPg merged 10 commits intosgl-project:mainfrom
mingfeima:pr_qwen3_next_support

mingfeima commented Oct 29, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Uh oh!

mingfeima commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mingfeima commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Oct 29, 2025

Uh oh!

mingfeima commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mingfeima commented Oct 29, 2025 •

edited

Loading