Support getting checksums in weight checker by fzyzcjy · Pull Request #24537 · sgl-project/sglang

fzyzcjy · 2026-05-06T14:48:09Z

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Cover _random_like dtype branches, _postprocess_tensors (non-persistent buffer skip, fp8 quant pair handling for both fp32 and ue8m0-packed scales), _check_tensors error paths, and the WeightChecker class lifecycle (snapshot, reset_tensors, compare, handle dispatch). Real fp8 tensors are constructed via quant_weight_ue8m0/transform_scale_ue8m0; no mocks of fp8 utilities.

Launches a real sgl server and exercises snapshot/compare/reset_tensors plus an unknown-action negative case. Cases that mutate weights are named to sort last so they cannot affect earlier cases sharing the server.

Two new cases on top of the existing snapshot/compare/reset coverage: - update_weights_from_tensor with a divergent tensor must make compare fail and surface the param name in the error message - update_weights_from_tensor with byte-identical bytes (prime, snapshot, push the same bytes again) must keep compare passing

Qwen3-0.6B is smaller than the previous Llama-3.2-1B-Instruct default, shortening server launch and pre-fill steps. The fused name gate_up_proj.weight is sglang's actual on-disk parameter name (no HF remapping in the path), so the test exercises the parameter unambiguously.

Matches sglang's inference-time nn.Parameter convention so that _reset_tensors can do in-place copy_ without autograd rejecting it. Fixes: RuntimeError 'a leaf Variable that requires grad is being used in an in-place operation' from the reset / compare-after-reset cases.

_snapshot's '.detach().cpu()' is a no-op on a CPU tensor, so a CPU-only fixture leaves the snapshot aliasing live storage and masks reset-then- compare divergence. Putting the fixture on CUDA mirrors production (model is always on the device) and forces _snapshot to produce a real independent CPU copy.

Sending the fused gate_up_proj.weight name directly trips a name.replace collision in sglang's stacked_params_mapping (gate_up_proj contains the substring up_proj), producing the bogus key gate_gate_up_proj.weight and crashing the model loader. Use the HF unfused alias up_proj.weight with shape (intermediate_size, hidden_size); sglang rewrites it onto the fused tensor with shard_id=1, writing only the up half — sufficient to make compare detect a divergence.

Hoists the per-callsite skip-pattern lists in _reset_tensors and _postprocess_tensors to a single module-level _NON_PERSISTENT_BUFFER_PATTERNS tuple, accessed through _is_non_persistent_buffer_name. The unified set is the union of the two prior lists: cos_sin_cache, inv_freq, freqs_cis, _weight_fp32 — both callsites skip the same buffers now (previously _reset_tensors was missing inv_freq and _postprocess_tensors was missing freqs_cis).

Adds an action='checksum' route to WeightChecker.handle that returns a dict produced by pydantic ChecksumInfo, containing per-tensor hashes (hex of tensor_hash from mm_utils, GPU-accelerated via the existing gpu_tensor_hash triton kernel) plus this rank's ParallelismInfo (tp/dp/pp coordinates + global rank/size from torch.distributed). The computation reuses _postprocess_tensors so fp8 weights are dequantized to bf16 before hashing — two (qweight, scale) pairs that dequant to the same bf16 produce the same checksum, matching the semantics of the existing snapshot/compare path. Surrounded by torch.cuda.synchronize() and timed via logger.info so callers can observe per-rank duration. handle() now returns Optional[Dict] — None for snapshot/reset/compare and the dict payload for checksum.

Plumbs the optional dict returned by WeightChecker.handle from model_runner up to the /weights_checker HTTP body: - model_runner.check_weights now returns the underlying handle() value. - CheckWeightsReqOutput gains an optional payload: Dict carrying one rank's ChecksumInfo dict. - Scheduler's check_weights captures payload into the output. - TokenizerManager.check_weights now returns (success, message, ranks) where ranks is the per-rank list collected naively in fan-out order (None when no rank produced a payload). FanOutCommunicator.merge_results is left untouched so the 11+ existing 2-tuple callers keep working. - /weights_checker HTTP body adds a top-level 'ranks' key when present, preserving the prior 'success'/'message' shape for the snapshot, reset, and compare actions.

Adds unit coverage for ChecksumInfo / ParallelismInfo / _is_non_persistent_buffer_name / _hash_tensor / _compute_checksum: hash stability and hex format, parallelism info reflection, post-mutation hash drift, and round-trip through the strict pydantic schema. Extends TestHandle to cover the new 'checksum' route. The e2e test gains four cases on the shared engine: response shape, two-call stability, hash drift after update_weights_from_tensor, and absence of non-persistent buffer names in the checksum keys.

gemini-code-assist · 2026-05-06T14:48:13Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-05-06T14:52:16Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

fzyzcjy · 2026-05-06T14:52:42Z

/tag-and-rerun-ci

* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py

fzyzcjy added 11 commits May 6, 2026 21:34

test(weight_checker): e2e for /weights_checker HTTP endpoint

b5f8b34

Launches a real sgl server and exercises snapshot/compare/reset_tensors plus an unknown-action negative case. Cases that mutate weights are named to sort last so they cannot affect earlier cases sharing the server.

Merge branch 'main' into weight_ft/2

4f24d38

fzyzcjy marked this pull request as ready for review May 6, 2026 14:52

fzyzcjy requested review from CatherineSue, Fridge003, JustinTong0323, Ying1123, hnyls2002, ispobock, merrymercy, slin1237 and xiezhq-hermann as code owners May 6, 2026 14:52

Merge branch 'main' into weight_ft/2

a60643d

github-actions Bot added the run-ci label May 6, 2026

fzyzcjy merged commit c4c5541 into sgl-project:main May 6, 2026
59 of 68 checks passed

hnyls2002 mentioned this pull request May 6, 2026

Fix _postprocess_tensors test calls #24554

Closed

LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026

Support getting checksums in weight checker (sgl-project#24537)

888833b

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

Support getting checksums in weight checker (sgl-project#24537)

a16c0dc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support getting checksums in weight checker#24537

Support getting checksums in weight checker#24537
fzyzcjy merged 13 commits into
sgl-project:mainfrom
fzyzcjy:weight_ft/2

fzyzcjy commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

fzyzcjy commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fzyzcjy commented May 6, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

gemini-code-assist Bot commented May 6, 2026

Uh oh!

fzyzcjy commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant