Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
b0244e1
Add mm_fp4 trtllm backend
wenscarl Oct 30, 2025
69e7be5
Merge branch 'sgl-project:main' into mm_fp4_trtllm
wenscarl Nov 3, 2025
153f7b1
Use str env var
wenscarl Nov 3, 2025
d3e2ed6
Address comment.
wenscarl Nov 3, 2025
a4ad51e
Merge branch 'main' into mm_fp4_trtllm
wenscarl Nov 3, 2025
4488840
Merge branch 'main' into mm_fp4_trtllm
Fridge003 Nov 4, 2025
d63fd56
Fix typo.
wenscarl Nov 4, 2025
94d32b6
Wip
wenscarl Nov 6, 2025
77e2462
wip
wenscarl Nov 19, 2025
9c570b0
e2e
wenscarl Dec 18, 2025
f3d367e
Merge remote-tracking branch 'origin/main' into trtllm_mnnvl_ar_integ…
wenscarl Dec 18, 2025
29b7f22
Merge remote-tracking branch 'origin/main' into trtllm_mnnvl_ar_integ…
wenscarl Feb 10, 2026
4da0b52
Unified fusion api
wenscarl Feb 11, 2026
817f245
rm flashinfer_allreduce.py
wenscarl Feb 11, 2026
5040814
Deprecate enable_flashinfer_allreduce_fusion
wenscarl Feb 13, 2026
fd4591c
Guard sm version
wenscarl Feb 13, 2026
45d894a
Address comments
wenscarl Feb 24, 2026
e4efcab
Upd
wenscarl Feb 24, 2026
a63fbb5
fix merge conflict
wenscarl Feb 24, 2026
dd7861f
remove redudant code
wenscarl Feb 24, 2026
a4a04cc
Remove nnodes==1.
wenscarl Mar 5, 2026
c12cd3b
Upd
wenscarl Mar 5, 2026
ab46fd8
Upd doc
wenscarl Mar 6, 2026
aafb113
Merge remote-tracking branch 'origin/main' into trtllm_mnnvl_ar_integ…
wenscarl Mar 6, 2026
d15415d
Address comments
wenscarl Mar 9, 2026
e03e935
Merge branch 'main' into trtllm_mnnvl_ar_integration
Fridge003 Mar 11, 2026
cd77ec2
Merge remote-tracking branch 'origin/main' into trtllm_mnnvl_ar_integ…
wenscarl Mar 16, 2026
adaf673
Minor fix
wenscarl Mar 16, 2026
dfb4366
Merge branch 'main' into trtllm_mnnvl_ar_integration
Fridge003 Mar 16, 2026
537d595
Merge branch 'main' into trtllm_mnnvl_ar_integration
Fridge003 Mar 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/advanced_features/server_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
| `--moe-a2a-backend` | Select the backend for all-to-all communication for expert parallelism. | `none` | `none`, `deepep`, `mooncake`, `mori`, `nixl`, `ascend_fuseep`|
| `--moe-runner-backend` | Choose the runner backend for MoE. | `auto` | `auto`, `deep_gemm`, `triton`, `triton_kernel`, `flashinfer_trtllm`, `flashinfer_trtllm_routed`, `flashinfer_cutlass`, `flashinfer_mxfp4`, `flashinfer_cutedsl`, `cutlass` |
| `--flashinfer-mxfp4-moe-precision` | Choose the computation precision of flashinfer mxfp4 moe | `default` | `default`, `bf16` |
| `--enable-flashinfer-allreduce-fusion` | Enable FlashInfer allreduce fusion with Residual RMSNorm. | `False` | bool flag (set to enable) |
| `--flashinfer-allreduce-fusion-backend` | Enable FlashInfer allreduce fusion (fused allreduce + Residual + RMSNorm) and choose backend. When not set, the feature is disabled. Options: `auto` (choose best), `trtllm` (SM90/100, single-node only), `mnnvl` (SM100, single/multi-node). Backend support table (SM100/SM90, single/multi-node) is in `sglang.srt.layers.flashinfer_comm_fusion`. | `None` | `auto`, `trtllm`, `mnnvl` |
| `--enable-aiter-allreduce-fusion` | Enable aiter allreduce fusion with Residual RMSNorm. | `False` | bool flag (set to enable) |
| `--deepep-mode` | Select the mode when enable DeepEP MoE, could be `normal`, `low_latency` or `auto`. Default is `auto`, which means `low_latency` for decode batch and `normal` for prefill batch. | `auto` | `normal`, `low_latency`, `auto` |
| `--ep-num-redundant-experts` | Allocate this number of redundant experts in expert parallel. | `0` | Type: int |
Expand Down Expand Up @@ -563,6 +563,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
| `--enable-flashinfer-trtllm-moe` | NOTE: --enable-flashinfer-trtllm-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_trtllm' instead. | `None` | N/A |
| `--enable-triton-kernel-moe` | NOTE: --enable-triton-kernel-moe is deprecated. Please set `--moe-runner-backend` to 'triton_kernel' instead. | `None` | N/A |
| `--enable-flashinfer-mxfp4-moe` | NOTE: --enable-flashinfer-mxfp4-moe is deprecated. Please set `--moe-runner-backend` to 'flashinfer_mxfp4' instead. | `None` | N/A |
| `--enable-flashinfer-allreduce-fusion` | NOTE: --enable-flashinfer-allreduce-fusion is deprecated. Please set `--flashinfer-allreduce-fusion-backend=auto` instead. | `None` | N/A |
| `--crash-on-nan` | Crash the server on nan logprobs. | `False` | Type: str |
| `--hybrid-kvcache-ratio` | Mix ratio in [0,1] between uniform and hybrid kv buffers (0.0 = pure uniform: swa_size / full_size = 1)(1.0 = pure hybrid: swa_size / full_size = local_attention_size / context_length) | `None` | Optional[float] |
| `--load-watch-interval` | The interval of load watching in seconds. | `0.1` | Type: float |
Expand Down
2 changes: 1 addition & 1 deletion python/sglang/srt/layers/communicator.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ def apply_flashinfer_allreduce_fusion(batch_size: int):
and batch_size > 0
and batch_size <= FUSE_ALLREDUCE_MAX_BATCH_SIZE
and not is_dp_attention_enabled()
and get_global_server_args().enable_flashinfer_allreduce_fusion
and get_global_server_args().flashinfer_allreduce_fusion_backend is not None
and not is_flashinfer_allreduce_unavailable()
)

Expand Down
Loading
Loading