Migrate Intel CPU cases to the test/registered. by 1pikachu · Pull Request #22670 · sgl-project/sglang

1pikachu · 2026-04-13T06:59:23Z

Summary

PR to improve CPU CI coverage and refine Xeon CI triggering.

Changes

1. Run `stage-b-test-cpu` through unified test runner.

2. Update CPU per-commit suite to include `stage-b-test-cpu`.

3. Register CPU tests

22 new CPU test files were added under test/registered/cpu/.
Coverage now includes key CPU kernel/operator paths: activation, binding, bmm, causal_conv1d, cpu_graph, decode/extend, flash_attn, gemm, AMX attention backend, mamba, mla, moe, norm, qkv+rope, qwen3, shared_expert, topk, and rope.
utils.py was introduced as shared CPU test helpers (precision thresholds, parametrization, quantization/reference utilities) to standardize assertions and reduce duplicated test logic.
All new CPU tests are registered to suite="stage-b-test-cpu" (mostly with est_time=10) for consistent CI scheduling.

gemini-code-assist · 2026-04-13T06:59:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

1pikachu · 2026-04-13T07:08:56Z

/tag-run-ci-label

1pikachu · 2026-04-13T07:10:08Z

/tag-and-rerun-ci

1pikachu · 2026-04-13T07:32:12Z

/tag-and-rerun-ci

1pikachu · 2026-04-13T07:59:36Z

/tag-and-rerun-ci

1pikachu · 2026-04-13T09:24:48Z

/tag-and-rerun-ci

1pikachu · 2026-04-22T05:57:04Z

/tag-run-cpu-ci-label

1pikachu · 2026-04-22T06:06:10Z

/tag-cpu-and-rerun-ci

1pikachu · 2026-04-22T06:12:19Z

/rerun-failed-ci

mingfeima

what about the test cases under /test/srt/cpu/test_xxx.py

1pikachu · 2026-05-09T03:54:19Z

There is still test_arm64 using test/srt/cpu/test_*.py, so keep these files for now.

mingfeima · 2026-05-09T07:22:10Z

@gemini-code-assist review this one.

gemini-code-assist

Code Review

This pull request adds a comprehensive suite of CPU-based tests for various kernels and backends, including activation, attention, GEMM, MoE, and normalization, and integrates them into the CI pipeline. The review feedback highlights several technical issues and cleanup opportunities, such as a bug in boolean tensor generation in test_causal_conv1d.py, redundant global state initialization in loops, copy-paste errors in class names, and misleading default device arguments. Additionally, suggestions were made to remove unused classes, correct spelling errors, and update deprecated type hints for better compatibility.

gemini-code-assist · 2026-05-09T07:26:02Z

+        has_initial_states = torch.randint(0, 2, (batch,), dtype=torch.bool).fill_(
+            False
+        )


There are two issues here:

torch.randint does not support dtype=torch.bool and will raise a RuntimeError in standard PyTorch environments.

The .fill_(False) call immediately after generation makes the random initialization pointless.

If the intent was to test with random initial states, use .to(torch.bool) without fill_.

has_initial_states = torch.randint(0, 2, (batch,)).to(torch.bool)

gemini-code-assist · 2026-05-09T07:26:02Z

+    dtype = [torch.float16, torch.bfloat16]
+
+    def _silu_and_mul_test(self, m, n, dtype):
+        set_global_server_args_for_scheduler(ServerArgs(model_path="dummy"))


Calling set_global_server_args_for_scheduler inside _silu_and_mul_test is redundant and inefficient because it is executed for every parameter combination in the test_activation loop. It should be called once at the beginning of test_activation or in a setUp method.

gemini-code-assist · 2026-05-09T07:26:02Z

+class Mod(nn.Module):
+    def __init__(self, input_channel, output_channel, has_bias):
+        super(Mod, self).__init__()
+        self.linear = torch.nn.Linear(input_channel, output_channel, has_bias)
+
+    def forward(self, x):
+        return self.linear(x)


The Mod class is defined but never utilized within this test file. It should be removed to maintain code cleanliness.

gemini-code-assist · 2026-05-09T07:26:02Z

+                atol = rtol = precision[q_pe.dtype]
+                torch.testing.assert_close(q_pe, q_pe_clone, atol=atol, rtol=rtol)
+                torch.testing.assert_close(k_pe, k_pe_clone, atol=atol, rtol=rtol)
+                torch.testing.assert_close(k_pe, k_pe_clone)


This assertion is redundant because the preceding line (161) already verifies that k_pe and k_pe_clone are close within the specified tolerances.

mingfeima · 2026-05-11T01:35:01Z

@gemini-code-assist fix ci fail in https://github.com/sgl-project/sglang/actions/runs/25645519731/job/75273585846?pr=22670

gemini-code-assist · 2026-05-11T01:35:20Z

Hello @mingfeima, I am sorry to hear about the CI failure. Since I cannot directly access external links to CI logs, could you please provide the specific error message or the relevant traceback from the failed job? Once you share the error details, I will be happy to analyze the issue and provide a targeted suggestion to help resolve it.

…ack) Brings in upstream sgl-project/sglang main commits since 096ad02 (merge base, Laguna-XS.2 model support). Total: 28 upstream commits composed. Custom-stack files preserved intact (entirely-ours, byte-identical to origin/main): - Blackwell CuTe kernel suite (warp_decode_cute, g1_attention_cute, gated_norm_cute, layersplit_cute, fused_store_index_cache) - TurboQuant 2.5-bit dense KV cache path - HIGGS 2-bit dense KV cache path (with split-K decode) - NVFP4 IndexCache dispatcher (active gate) - quantization_config_dispatch (HF-config-driven runtime routing) - All custom server-args flags and runtime methods preserved Verification: - 200+ merged Python files compile cleanly - Dispatcher symbol presence verified - HIGGS pool / TurboQuant pool classes present at expected lines - compressed_tensors_w4a4_nvfp4_moe imports clean - All custom server-args flags present (enable_higgs_dense_2bit_kv_cache, enable_turboquant_dense_kv_cache, turboquant_dense_kv_preset, indexer_quantization_declared, higgs_mla_decode_num_splits, etc.) Manual-merged shared files (auto-merge gave broken/mixed output; cleaned up post-merge): - python/sglang/srt/disaggregation/mooncake/conn.py: upstream's PR#24932 refactored maybe_send_extra into a state-types-loop. Replayed our LayerSplit NSA state-index-length-mismatch check inside the SWA/NSA branch of the new loop body. - sgl-kernel/python/sgl_kernel/__init__.py: upstream's PR#23449 (Apple Silicon Metal kernel) wrapped the entire module body in `if darwin/arm64: from sgl_kernel.metal import * else: ...`. The auto-merge duplicated the file body; rewrote cleanly with upstream's structure and re-injected our `g1_gate_forward`, `warp_decode_cute_moe_forward`, and `warp_decode_cute_moe_packed_forward` imports plus `g1_gate_forward` in _DEBUG_EXPORT_NAMES. - python/sglang/srt/managers/scheduler_output_processor_mixin.py: line 628 still referenced `result.num_accepted_drafts` (renamed by PR sgl-project#25038 to `num_correct_drafts`). Renamed in place. - python/sglang/srt/observability/scheduler_metrics_mixin.py: a block around the spec-decode logging path had mixed old/new names from auto-merge (lines 553/557/560). Renamed `spec_num_accepted_tokens` -> `spec_num_accept_tokens` and local `num_accepted_drafts` -> `num_correct_drafts` to match the rest of the file. - test/test_smc_info.py: stub Req mock used the old field names `spec_accepted_drafts` and `update_spec_acceptance_histogram`. Renamed to `spec_num_correct_drafts` and `update_spec_correct_drafts_histogram` per PR sgl-project#24081. Auto-merge cleanly integrated upstream changes to: - server_args.py (new fields: prefill_only_disable_kv_cache, weight_loader_drop_cache_after_load, prefill_delayer_queue_min_ratio, prefill_delayer_max_delay_ms, speculative_draft_window_size, etc.) - mem_cache/memory_pool.py (new NoOpMHATokenToKVPool) - model_executor/model_runner_kv_cache_mixin.py (NoOpMHATokenToKVPool pool factory + _validate_prefill_only_disable_kv_cache_pool_family) - layers/attention/nsa_backend.py (spec rename num_accepted_drafts -> num_correct_drafts; num_accepted_tokens -> num_accept_tokens) - layers/attention/nsa/nsa_indexer.py (new _apply_q_scale_and_softmax_scale compile method; torch.mm replaces deep_gemm wrapper) - 28+ disaggregation/spec/runner files with mostly clean upstream-side-only integration. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> ----- upstream commit subjects (28) ----- fd3eb77 [Cookbook]: add Laguna-XS.2 (Poolside) (sgl-project#24730) 6be1a45 Fix swa component host hit (sgl-project#25085) 693f497 [NPU] use causal_conv1d_update_v2 for performance (sgl-project#24595) 1efe9e2 [Bug Fix] Reject incompatible combination of --disable-cuda-graph-padding and --enable-torch-compile (sgl-project#23903) 8d27ce7 Optimize uvicorn startup command (sgl-project#25041) b35fd5f [fix] skip legacy minicpmv conv template for MiniCPM-V 4.6 (sgl-project#24998) 7582237 [Tiny Fix] Disable BCG when inner layer_model unresolved (sgl-project#25021) ca3bc05 Deepseek-v4-Pro share expert tp1 (sgl-project#24949) a72d3ae [Spec] Multi-layer mamba scatter cleanup; fix positional call bug (sgl-project#25030) 7128533 Revert "Migrate Intel CPU cases to the test/registered." (sgl-project#25044) 1f985c5 [Spec] Rename `accepted_indices` -> `accept_indices`; drop `_token_id` suffix per Rule 5 (sgl-project#25038) ecf5d84 Migrate Intel CPU cases to the test/registered. (sgl-project#22670) d7f4761 [PD] Refactor hybrid state transfer (sgl-project#24932) 91907b7 [UnifiedTree]: Fix Unified HiCache tombstone lock release replay (sgl-project#24972) 4ad63ad [Spec] Rename `accepted_drafts` -> `correct_drafts` for unambiguous naming (sgl-project#24081) 6bfb365 [PD] Rate limit prefill inflight polling warnings (sgl-project#24967) 6bb79c1 [Linear Attn] Add CUSTOM enum and plugin extensibility for kernel backends (sgl-project#24937) cfc41d5 Fix kimi k2.5 mla eagle + dp attention (sgl-project#25033) 0f3932c [Fix] Qwen3-ASR config: set thinker_config before super().__init__ (sgl-project#24187) f526e3f [Spec] Mamba scatter cleanup; fix multi-layer positional bug; dflash naming (sgl-project#25029) 10375a1 [NIXL][XPU] Fix uint64 overflow for mismatched P/D TP sizes (e.g. prefill_tp=1, decode_tp=2) (sgl-project#24648) 0a37d24 [diffusion] hardware: support sage attention backend on MUSA (attn backend, 21/N) (sgl-project#24752) 5495026 [HiCache] feat: default storage prefetch timeout (sgl-project#23309) 186eb42 Feat: Support SWA (Sliding Window Attention) for EAGLE-3 drafter (sgl-project#24664) a75b79e Feat: Support newer EAGLE-3 drafters (sgl-project#24663) f3a8189 [Spec] Internal rename per N2 v2 naming rule (sgl-project#25014) bfc2eda [MUSA] Use MUSA-optimized operators in piecewise CUDA graph (sgl-project#23633) 74d70af [Apple Silicon] Add Metal kernel support in sgl-kernel (sgl-project#23449)

github-actions Bot added the run-ci label Apr 13, 2026

1pikachu requested review from Fridge003, Kangyan-Zhou, bingxche, ispobock and merrymercy as code owners April 13, 2026 07:31

1pikachu marked this pull request as draft April 13, 2026 07:32

1pikachu marked this pull request as ready for review April 13, 2026 07:59

mingfeima reviewed Apr 22, 2026

View reviewed changes

Comment thread .github/workflows/slash-command-handler.yml Outdated

mingfeima changed the title ~~[Draft PR] update CPU CI suite~~ [Test Only: Do not merge] update CPU CI suite Apr 22, 2026

1pikachu force-pushed the main branch 2 times, most recently from d2a34c6 to be973be Compare May 6, 2026 08:35

github-actions Bot requested review from JustinTong0323 and wisclmy0611 as code owners May 6, 2026 13:48

github-actions Bot added the documentation Improvements or additions to documentation label May 7, 2026

1pikachu force-pushed the main branch from 9307273 to a7b9c08 Compare May 8, 2026 05:37

mingfeima added intel cpu cpu backend performance optimization labels May 8, 2026

1pikachu changed the title ~~[Test Only: Do not merge] update CPU CI suite~~ Migrate Intel CPU cases to the test/registered. May 8, 2026

mingfeima reviewed May 9, 2026

View reviewed changes

Comment thread docs_new/index.mdx

1pikachu force-pushed the main branch from cbb11f8 to dda8c8e Compare May 9, 2026 02:20

1pikachu force-pushed the main branch 3 times, most recently from 5805005 to 984f9b9 Compare May 9, 2026 03:52

gemini-code-assist Bot reviewed May 9, 2026

View reviewed changes

mingfeima approved these changes May 11, 2026

View reviewed changes

Comment thread test/registered/cpu/test_extend.py

Comment thread test/registered/cpu/test_extend.py

Migrate Intel CPU cases to the test/registered

c5e2a4b

1pikachu force-pushed the main branch from 9a27ab4 to c5e2a4b Compare May 11, 2026 02:54

ranimandepudi mentioned this pull request May 11, 2026

Add Arm64 INT8 MoE test coverage #25007

Open

mingfeima merged commit ecf5d84 into sgl-project:main May 12, 2026
94 of 124 checks passed

mingfeima mentioned this pull request May 12, 2026

Revert "Migrate Intel CPU cases to the test/registered." #25044

Merged

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

Migrate Intel CPU cases to the test/registered. (sgl-project#22670)

eb7cc13

xjpang pushed a commit to xjpang/sglang that referenced this pull request May 13, 2026

Migrate Intel CPU cases to the test/registered. (sgl-project#22670)

12afa85

Conversation

1pikachu commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Run stage-b-test-cpu through unified test runner.

2. Update CPU per-commit suite to include stage-b-test-cpu.

3. Register CPU tests

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

1pikachu commented Apr 13, 2026

Uh oh!

1pikachu commented Apr 13, 2026

Uh oh!

1pikachu commented Apr 13, 2026

Uh oh!

1pikachu commented Apr 13, 2026

Uh oh!

1pikachu commented Apr 13, 2026

Uh oh!

1pikachu commented Apr 22, 2026

Uh oh!

1pikachu commented Apr 22, 2026

Uh oh!

1pikachu commented Apr 22, 2026

Uh oh!

mingfeima left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

1pikachu commented May 9, 2026

Uh oh!

mingfeima commented May 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

mingfeima commented May 11, 2026

Uh oh!

gemini-code-assist Bot commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1pikachu commented Apr 13, 2026 •

edited

Loading

1. Run `stage-b-test-cpu` through unified test runner.

2. Update CPU per-commit suite to include `stage-b-test-cpu`.