[NPU]Documentation update for communications quantization feature by egvenediktov · Pull Request #24668 · sgl-project/sglang

egvenediktov · 2026-05-08T06:59:17Z

Motivation

This PR contains updated documentation for recently introduced feature of communications quantization --enable-quant-communications from #20520.

Modifications

1 file changed:
docs_new/docs/advanced_features/server_arguments.mdx (Added description for the argument)

Accuracy Tests

PR does not affect accuracy.

Speed Tests and Profiling

PR does not affect inference performance.

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

gemini-code-assist · 2026-05-08T06:59:20Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

OrangeRedeng · 2026-05-08T13:46:59Z

LGTM

* main: (87 commits) [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629) fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217) Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742) Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785) [NPU]Documentation update for communications quantization feature (sgl-project#24668) [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851) [Model] Add MiniCPM-V 4.6 support (sgl-project#24855) Support Intern-S2-Preview (sgl-project#24875) [PD] Unify dsv4 dispatch with swa (sgl-project#24888) Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775) Fix PD bootstrap failure handling (sgl-project#24772) [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881) [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878) [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859) [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696) [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826) [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456) [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854) [diffusion] CI: add cache-dit CI tests (sgl-project#19213) [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767) ... # Conflicts: # python/sglang/srt/utils/common.py

Update server_arguments.mdx

72b485c

egvenediktov requested review from JustinTong0323 and wisclmy0611 as code owners May 8, 2026 06:59

github-actions Bot added the documentation Improvements or additions to documentation label May 8, 2026

egvenediktov changed the title ~~Documentation update for communications quantization~~ Documentation update for communications quantization feature May 8, 2026

ping1jing2 self-assigned this May 8, 2026

ping1jing2 changed the title ~~Documentation update for communications quantization feature~~ [NPU]Documentation update for communications quantization feature May 8, 2026

ping1jing2 approved these changes May 10, 2026

View reviewed changes

sglang-npu-bot merged commit 2473659 into sgl-project:main May 10, 2026
53 checks passed

egvenediktov deleted the docs_update branch May 12, 2026 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU]Documentation update for communications quantization feature#24668

[NPU]Documentation update for communications quantization feature#24668
sglang-npu-bot merged 1 commit into
sgl-project:mainfrom
egvenediktov:docs_update

egvenediktov commented May 8, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 8, 2026

Uh oh!

OrangeRedeng commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

egvenediktov commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented May 8, 2026

Uh oh!

OrangeRedeng commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

egvenediktov commented May 8, 2026 •

edited

Loading