Skip to content

[NPU]Documentation update for communications quantization feature#24668

Merged
sglang-npu-bot merged 1 commit into
sgl-project:mainfrom
egvenediktov:docs_update
May 10, 2026
Merged

[NPU]Documentation update for communications quantization feature#24668
sglang-npu-bot merged 1 commit into
sgl-project:mainfrom
egvenediktov:docs_update

Conversation

@egvenediktov
Copy link
Copy Markdown
Contributor

@egvenediktov egvenediktov commented May 8, 2026

Motivation

This PR contains updated documentation for recently introduced feature of communications quantization --enable-quant-communications from #20520.

Modifications

1 file changed:
docs_new/docs/advanced_features/server_arguments.mdx (Added description for the argument)

Accuracy Tests

PR does not affect accuracy.

Speed Tests and Profiling

PR does not affect inference performance.

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 8, 2026
@egvenediktov egvenediktov changed the title Documentation update for communications quantization Documentation update for communications quantization feature May 8, 2026
@ping1jing2 ping1jing2 self-assigned this May 8, 2026
@ping1jing2 ping1jing2 changed the title Documentation update for communications quantization feature [NPU]Documentation update for communications quantization feature May 8, 2026
@OrangeRedeng
Copy link
Copy Markdown
Contributor

LGTM

@sglang-npu-bot sglang-npu-bot merged commit 2473659 into sgl-project:main May 10, 2026
53 checks passed
ltcs11 added a commit to ltcs11/sglang that referenced this pull request May 11, 2026
* main: (87 commits)
  [Fix] Disable FlashInfer allreduce fusion under deterministic inference (sgl-project#24629)
  fix: STANDALONE spec-decode hidden-size mismatch crash (sgl-project#24217)
  Followup fix for Custom AR V2 in non NVL scenarios (sgl-project#24742)
  Fix reduce_scatterv producer contract for SUM_LEN (sgl-project#24785)
  [NPU]Documentation update for communications quantization feature (sgl-project#24668)
  [Session R3] Add routed_experts_start_len for absolute routing slice control (sgl-project#24851)
  [Model] Add MiniCPM-V 4.6 support (sgl-project#24855)
  Support Intern-S2-Preview (sgl-project#24875)
  [PD] Unify dsv4 dispatch with swa (sgl-project#24888)
  Optimize MHC pipeline: DeepGemm, fused norm, fused hc_head (sgl-project#24775)
  Fix PD bootstrap failure handling (sgl-project#24772)
  [Spec] Cleanup idle stub and shape-check patterns (sgl-project#24881)
  [Bug] Add dsv4 state_type branch to mooncake disaggregation (sgl-project#24878)
  [Spec V1] Split draft-extend phase from `EagleDraftInput` into new `EagleDraftExtendInput` (sgl-project#24859)
  [Gemma4] Optimize Gemm4 with fused Q/K/V RMSNorm + per-expert FP8 ckpt loader (sgl-project#24696)
  [spec decoding] support kimi-k2.5-eagle3-mla (sgl-project#24826)
  [SPEC V2] fix: skip stale state updates in spec-v2 overlap (sgl-project#23456)
  [RL] Call torch.cuda.empty_cache() for `in-place` pause mode to avoid OOM (sgl-project#24854)
  [diffusion] CI: add cache-dit CI tests (sgl-project#19213)
  [Utils] Make request dump robust to unpicklable server_args and large meta_info (sgl-project#24767)
  ...

# Conflicts:
#	python/sglang/srt/utils/common.py
@egvenediktov egvenediktov deleted the docs_update branch May 12, 2026 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants