Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2440 commits
Select commit Hold shift + click to select a range
b24235b
[model-gateway] update workflow names for gateway and exclude npu (#1…
slin1237 Nov 17, 2025
8b5e2c5
[Tiny fix] Fix bench_speculative.py run bug (#13416)
BBuf Nov 17, 2025
b436113
[model-gateway] Add Gateway Release Tooling (#13420)
slin1237 Nov 17, 2025
ac406d4
fix uneven PP layer indices (#13282)
alpha-baby Nov 17, 2025
7afff8f
diffusion: fix wan2.2 ti2v num_frames adjust logic (#13379)
mickqian Nov 17, 2025
80797c2
[PD][bug fix] fix memleak when last_batch is none (#13144)
XucSh Nov 17, 2025
9ba3597
Fix cache_tokens calculate issue when retracted (#11900)
QiuMike Nov 17, 2025
15db549
[feature] Custom base path on FastAPI server (#5879)
kebyn Nov 17, 2025
df56139
Adding user defined hooks support (#13217)
Carlomus Nov 17, 2025
c236d05
Fix log time stats (#13418)
qhsc Nov 17, 2025
7b44526
[Ci tiny fix] Lower score threshold in evaluation test (#13443)
BBuf Nov 17, 2025
ff00b6a
diffusion: fix loading with local model_path (#13445)
mickqian Nov 17, 2025
6042010
[2/N] CI refactor: sperate some backend-independent CPU tasks. (#13447)
hnyls2002 Nov 17, 2025
e486308
Temporarily disable model hooks CI (#13450)
hnyls2002 Nov 17, 2025
a8fcbf6
[Deepseek V3.2] Use torch.compile to speed up torch.cat in nsa (#13022)
hlu1 Nov 17, 2025
25acbbc
Remove verbs from GET endpoint paths to follow REST standards (#13273)
slin1237 Nov 17, 2025
58f8f4e
Add missing models (#13456)
Kangyan-Zhou Nov 17, 2025
a63f433
extend sagemaker.Dockerfile serve script to allow all sglang serve fl…
sirutBuasai Nov 17, 2025
2bc7c5e
Fix 8-gpu B200 nightly tests (#13457)
Kangyan-Zhou Nov 17, 2025
ea89a3a
Fixes validation errors for Wan-AI models which store model weights i…
Kangyan-Zhou Nov 17, 2025
aac07bf
[Embeddings Performance Testing] Add performance test for embedding m…
vedantjh2 Nov 17, 2025
e389f91
[NVIDIA] Fix broken fp8 MoE of deepseek v3 (#13264)
kaixih Nov 18, 2025
a1e37b0
Temporarily comment out multimodal gen test to recover runners (#13463)
Kangyan-Zhou Nov 18, 2025
e2c9a59
Update pr-test.yml to fix invalid job name error
Kangyan-Zhou Nov 18, 2025
d879e37
Add interface_v1 option for dynamic HiCache backend (#13140)
pansicheng Nov 18, 2025
85ae508
Add bfloat16 tuned fused moe config for Dpsk-MTP layer on B200 (#13455)
Fridge003 Nov 18, 2025
9846f8e
fix MambaPool clear method after refactoring (#13449)
zminglei Nov 18, 2025
fe3bbfb
[AMD CI] Update sgl-router python path in dockerfile. (#13458)
saienduri Nov 18, 2025
7119d18
[CI] re-enable test_vision_openai_server_a ci (#13444)
yhyang201 Nov 18, 2025
d7984f3
Adding CI Monitor Improvements (#13462)
dougyster Nov 18, 2025
90c18a1
[GLM4.6v] Required changes for bumping up to transformer 5.x (#13229)
byjiang1996 Nov 18, 2025
26ca074
[GLM4.6v] Relax the constraint of non-user role chat completion messa…
byjiang1996 Nov 18, 2025
9188fec
[model-gateway] use worker startup time out for worker registration (…
slin1237 Nov 18, 2025
aa8ecbd
model: support JetVLM (#13289)
futrime Nov 18, 2025
f1be8aa
chore: add an unified server arg for multimodal inputs preprocess con…
WingEdge777 Nov 18, 2025
4c3573e
[PD] Clarify init method docstrings for kvsender and kvreceiver (#13476)
ShangmingCai Nov 18, 2025
4ce8fb3
Fix lora test (#13479)
hnyls2002 Nov 18, 2025
f338607
[Piecewise CUDA Graph] Support ModelOpt FP8 (#13094)
b8zhong Nov 18, 2025
0c96677
CI: fix NFS EBUSY error in PR test workflow (#13460)
alisonshao Nov 18, 2025
67071f5
[CI] fix triggered by a non-run-ci label (#13393)
hnyls2002 Nov 18, 2025
4e41edc
[CI] remove auto-labeling `run-ci` label. (#13486)
hnyls2002 Nov 18, 2025
a5ad006
fix: change performance log directory to cache path (#13482)
ch-wan Nov 18, 2025
595adf6
[CI] Add input for pr-gate (#13491)
hnyls2002 Nov 18, 2025
820e13c
[opt kimi k2 3/n] opt kimi_k2 moe_fused_gate kernel (#13374)
BBuf Nov 18, 2025
3390500
[CI] fix lint yml (syntax error) (#13496)
hnyls2002 Nov 18, 2025
ac81db6
[VLM][feat] Support encoder DP for Qwen2.5-VL (#13126)
liusy58 Nov 18, 2025
cfcf275
[HiCache] Critical fix to host memory double free (#13501)
xiezhq-hermann Nov 18, 2025
7e88b9c
[BugFix] Accuracy and function Issue when run ptpc quant model (#13157)
Yuechguo Nov 18, 2025
6beb6e9
fix: create git tags directly instead of temporary branches (#13168)
alisonshao Nov 18, 2025
e2d6746
Add .github/CI_PERMISSIONS.json to define the CI permissions (#13509)
merrymercy Nov 18, 2025
7bc99d4
README.md -> FOLDER_README.md (#13510)
merrymercy Nov 18, 2025
f6cfe9f
Use slash command to trigger CI (#13512)
merrymercy Nov 18, 2025
6380707
Add docs on trigger ci (#13513)
merrymercy Nov 18, 2025
518467b
[Feature] Re:Enable hybrid mem saver (#12962)
ocss884 Nov 18, 2025
6d025fd
Trigger CI retry with edit (#13516)
merrymercy Nov 18, 2025
2e1dbdb
Update docs (#13519)
merrymercy Nov 18, 2025
c1a30aa
Add /tag-and-rerun-ci (#13521)
sglang-bot Nov 18, 2025
109f27b
[CI] update pr-gate to be compatible with new slash triggering manane…
hnyls2002 Nov 18, 2025
6e9b154
[CI] fix skipping pr-gate on main (#13525)
hnyls2002 Nov 18, 2025
d79e129
Small cleanups related to LoRA weight loading (#13474)
glenliu21 Nov 18, 2025
f5566ac
[CI] fix CI skipped on main (#13527)
hnyls2002 Nov 18, 2025
b8e32e7
[model-gateway] fix gateway docker build due to recent py code change…
CatherineSue Nov 18, 2025
3a6ec47
[model-gateway] limit opened files in docker build to fix edge case (…
CatherineSue Nov 18, 2025
6b9459e
[docker] fix dockerfile naming for diffusion (#13534)
slin1237 Nov 18, 2025
a9d22b7
fix lora test (#13537)
gongwei-130 Nov 18, 2025
c0d1a33
Remove jet-ai/Jet-Nemotron-2B in nightly text tests as this is consta…
Kangyan-Zhou Nov 18, 2025
9b64f6f
[fix] Fixes accuracy issues caused by incorrect use of rope (#13495)
Baidu-AIAK Nov 18, 2025
92ad2ff
Flashinfer TRTLLM-GEN-MoE + Qwen3 (#13489)
b8zhong Nov 18, 2025
10969ae
[chore] Disable ccache for sgl-kernel release (#13541)
Fridge003 Nov 18, 2025
cf1f016
Add Qwen/Qwen1.5-MoE-A2.7B to model list (#13543)
Kangyan-Zhou Nov 18, 2025
9f59194
[Fix] Fix DeepSeek V3 MTP on B200 (#13548)
Fridge003 Nov 19, 2025
0d2d687
[router][grpc] Support num_reasoning_tokens in haromy models (#13047)
CatherineSue Nov 19, 2025
6c2e5fc
[feat][Ascend][Mindspore]: support model-impl of mindspore (#9234)
chz34 Nov 19, 2025
9a1a9a4
[AMD CI] Local cache fallback. (#13452)
saienduri Nov 19, 2025
f7be98e
[CI] fix amd 1 gpu basic test (#13551)
hnyls2002 Nov 19, 2025
075ba74
[Doc] Update HiCache and Mooncake docs & Mooncake Setup Error Checkin…
ykwd Nov 19, 2025
3798055
purge unnecessary env variable set in deterministic test (#13481)
zminglei Nov 19, 2025
b638abb
chore: bump sgl-kernel version to 0.3.17.post2 (#13542)
sglang-bot Nov 19, 2025
ba9102f
Add `lmsys/gpt-oss-20b-bf16` to model validation check (#13557)
hnyls2002 Nov 19, 2025
8900f99
CI Failure Monitor Improvements (#13558)
dougyster Nov 19, 2025
e197bef
[RL] Allow passing tensors of different dtypes for FlattenedTensorBuc…
zhuzilin Nov 19, 2025
97ba2c2
[CI] Fix CUDA workflow's dependency. (#13568)
hnyls2002 Nov 19, 2025
d4a4dcd
[NPU] Adapt pr-gate for pr-test workflow & workflows refresh (#13567)
iforgetmyname Nov 19, 2025
a1e1e53
Tiny enhance test suites sanity check (#13589)
hnyls2002 Nov 19, 2025
196b940
[3/N] CI refactor: move some manually triggered tests. (#13448)
hnyls2002 Nov 19, 2025
e72cf13
Support moe topk sigmoid kernel (#13049)
rogeryoungh Nov 19, 2025
a355794
Expend compatibility check for all quantized MoE models (#13465)
JustinTong0323 Nov 19, 2025
83756a4
add https://github.com/netanel-haber to CI_PERMISSIONS.json (#13577)
netanel-haber Nov 19, 2025
bfaf0b8
chore: bump sgl-kernel version to 0.3.17.post2 (#13570)
sglang-bot Nov 19, 2025
17b24ac
[Auto Sync] Update base_grammar_backend.py, collector.py (20251116) (…
merrymercy Nov 20, 2025
f88b2aa
[GDN] Remove unnecessary contiguous() (#13604)
byjiang1996 Nov 20, 2025
67fca6b
[GDN] Remove unnecessary conv state clone (#13603)
byjiang1996 Nov 20, 2025
af6bcad
[VLM] Support Piecewise CUDA Graph for Qwen2.5-VL (#13055)
yuan-luo Nov 20, 2025
127d59c
[diffusion] CI: improve diffusion CI (#13562)
mickqian Nov 20, 2025
48ca9f7
feat: support external custom models (#13429)
zhooooong Nov 20, 2025
dc69462
[CI fix] Fix image download failures in VLM CI tests (#13613)
BBuf Nov 20, 2025
c3c4da7
[NVIDIA] Add fp8 gemm benchmark on blackwell (#13528)
kaixih Nov 20, 2025
21370ef
[UT] Destroy process group after broadcast to resolve port occupation…
galeselee Nov 20, 2025
2e3a69a
[diffusion] refactor: remove PreprocessorConfig (#13248)
mickqian Nov 20, 2025
bc42c8c
[diffusion] refactor: refactor pipeline folders (#13253)
mickqian Nov 20, 2025
10e0b83
Add FP32 dtype support for RoPE - Part2 (#13328)
jinyouzhi Nov 20, 2025
c7b37b7
[diffusion] fix: remove multimodal_gen redundant get_bool_env_var fun…
shauntajoesph-ops Nov 20, 2025
7dcf910
Add support for new aiter version (AR accuracy, is_shuffled PR) (#13554)
1am9trash Nov 20, 2025
4a8442a
diffusion: improve baseline performance monitor (#13614)
mickqian Nov 20, 2025
b51f9bb
[Feature] Introduce JIT Kernel in sglang (with hicache JIT kernel) (#…
DarkSharpness Nov 20, 2025
19729f7
[CI] Align metric units for CI rate limit (#13633)
hnyls2002 Nov 20, 2025
c8ede0e
[ROCM] Optimized deepseek-r1 fp8 model with + triton_gemm_a8w8 + batc…
yctseng0211 Nov 20, 2025
2847e5c
fix bench_speculative bug (#13197)
Lzhang-hub Nov 20, 2025
7af9b88
Revert "[Feature] Introduce JIT Kernel in sglang (with hicache JIT ke…
merrymercy Nov 20, 2025
852eb6c
[CI] optimize CI workflow info (#13634)
hnyls2002 Nov 20, 2025
a352e83
CI: Kill zombie diffusion processes in CI & minor code style fix on r…
merrymercy Nov 20, 2025
4528cb7
[CI] apply pr-gate for XPU (#13663)
hnyls2002 Nov 20, 2025
acde21d
Add fused_rmsnorm_gated_cpu kernel for CPU to support Qwen3-Next (#11…
yanbing-j Nov 20, 2025
2dec555
[10/n] decouple quantization impl from vllm dependency - fix import (…
FlamingoPg Nov 20, 2025
fc9efdc
Adding nightly tests as release guard for bot bump workflows (#13655)
dougyster Nov 20, 2025
fa92441
[DeepseekV3.2] Deepseek fp8 support for MHA path (#12964)
YAMY1234 Nov 20, 2025
6bc3062
Fix launch of `Olmo3` (#13666)
vincentzed Nov 20, 2025
7291c72
[Deepseek V3.2] Change indexer weights_proj to fp32 (#13459)
hlu1 Nov 20, 2025
42028af
enable csgmv automatically on cuda (#13600)
b8zhong Nov 20, 2025
5a2c703
Add nightly test CI monitor workflow (#13038)
alisonshao Nov 20, 2025
ada8ce1
allow loras to be implicitly evicted and loaded based on max_loaded_l…
glenliu21 Nov 20, 2025
6b262ac
Test reorganization: Move tests to manual/ (#13610)
alisonshao Nov 20, 2025
b5344b3
[Piecewise CUDA Graph] Fix recompile issue for Mixtral and Grok2 (#13…
hebiao064 Nov 20, 2025
3f1cfd8
Super tiny remove unused MiniMaxM2MLP class (#13659)
fzyzcjy Nov 20, 2025
c56fc42
Update quantization.md with new model resources (#13677)
zhaochenyang20 Nov 20, 2025
3ae664d
[model-gateway] add both python and rust cli alias (#13678)
slin1237 Nov 21, 2025
c0a2513
[diffusion] CI: improve validation method (#13627)
mickqian Nov 21, 2025
c4db77f
[model-gateway] fix gateway cli arg parser to not use = (#13685)
CatherineSue Nov 21, 2025
81e8699
[CI] Move nightly tests to test/nightly/ (#13683)
alisonshao Nov 21, 2025
db2d362
[NVIDIA] Add cutedsl e2e test to GB200 CI (#12672)
kaixih Nov 21, 2025
64480ec
Add sgl-kernel CI test for Blackwell (B200) (#13301)
alisonshao Nov 21, 2025
750084a
remove unnecessary starvation check (#13619)
glenliu21 Nov 21, 2025
6be65ae
Fix target MLA with eagle3 support for PD disaggregation (#13555)
QiuMike Nov 21, 2025
fb04d43
[kimi k2 thinking] Avoid useless torch.zeros_ (#13596)
BBuf Nov 21, 2025
bfcf15a
[opt kimi k2 4 / n] Delete useless pad kernel in sgl_moe_align_block_…
BBuf Nov 21, 2025
475962a
[VLM] Support Piecewise CUDA Graph for InternVL (#13640)
yuan-luo Nov 21, 2025
d754ce9
[Piecewise Cuda Graph] rename, refactor and add more logging (#13675)
hebiao064 Nov 21, 2025
8c212a2
[difusion] CI: speed up multimodal_gen ci (#13665)
yhyang201 Nov 21, 2025
eda2f70
[diffusion] doc: minor update docs (#13177)
mickqian Nov 21, 2025
b537ac0
Fix ZMQ bind error on non-zero rank nodes when using SGLANG_BLOCK_NON…
ishandhanani Nov 21, 2025
4360279
[diffusion] server: use meta to avoid Linear init for TextEncoder (#1…
zyksir Nov 21, 2025
90a0133
[Auto Sync] Update http_server.py, io_struct.py, scheduler_... (20251…
merrymercy Nov 21, 2025
b30f63c
[Bugfix] Fix hidden state size in EAGLE PD disaggregation buffers (#1…
michelemarzollo Nov 21, 2025
1bb063a
[HiCache] fix unit test with changed new APIs (#13498)
stmatengss Nov 21, 2025
a244c03
[Fix] Qwen3Next lmhead dtype (#13708)
ZeldaHuang Nov 21, 2025
589d9ad
[NPU] chore: bump to CANN 8.3.RC1 and Pytorch 2.8.0 (#13647)
iforgetmyname Nov 21, 2025
6d0e0b9
[11/N] MoE Refactor: Simplifying SBO Implementation with Dispatcher H…
ch-wan Nov 21, 2025
a34d3ab
[Clean code] Compressed_tensors_moe code clean (#13719)
BBuf Nov 21, 2025
5e7f91d
[diffusion] profile: support performance metric dumping and compariso…
mickqian Nov 21, 2025
eff7df6
[AMD] Enable fused shared expert append and flatten quant for fp8 dee…
yichiche Nov 21, 2025
323fed5
[diffusion] doc: add contributing.md (#13649)
mickqian Nov 21, 2025
dc83690
fix 3fs down, lock schedule main thread (#13407)
weibingo Nov 21, 2025
99e13d1
Fix url: use https://roadmap.sglang.io for roadmap (#13733)
merrymercy Nov 21, 2025
1776dce
Super tiny delete unused files (#13734)
fzyzcjy Nov 21, 2025
aa6e2c8
[diffusion] log: minor improve logging (#13735)
mickqian Nov 21, 2025
964cded
[CI] minor hot fix of model validation list (#13737)
hnyls2002 Nov 21, 2025
e94ef9f
Add to ci permission (#13739)
guapisolo Nov 21, 2025
85ffce3
[Piecewise CUDA Graph] Support Kimi-K2 (non-Thinking) (#13466)
b8zhong Nov 21, 2025
dab06b5
Fix: CI monitor should not exit with error on regressions (#13694)
alisonshao Nov 21, 2025
681b9e6
Revert "enable csgmv automatically on cuda" (#13707)
Qiaolin-Yu Nov 21, 2025
45c572c
Support torch 12.9 + DeepEP by removing custom nvshmem (#12949)
fzyzcjy Nov 21, 2025
a24aefe
add some more labels (#13701)
b8zhong Nov 21, 2025
1b48e1b
Feat/nemotron nano v3 support (#12690)
roikoren755 Nov 21, 2025
a56f770
Fix global scaling factor loading hang (#13484)
wenscarl Nov 22, 2025
59b4d7f
Fix B200 Nightly tests and move one manual test back to unit test to …
Kangyan-Zhou Nov 22, 2025
53620a1
fix test_lora_update.py starvation message check (#13702)
glenliu21 Nov 22, 2025
94ae816
Fix model weights validation with automatic cache cleanup (#13729)
alisonshao Nov 22, 2025
b41afa3
[Auto Sync] Update evict_policy.py, radix_cache.py (20251120) (#13669)
merrymercy Nov 22, 2025
8bfce9b
[Tiny] Renaming environ for NVFP4 dispatch (#13756)
Fridge003 Nov 22, 2025
3805243
modularize gsm8k and mmmu test classes (#13506)
netanel-haber Nov 22, 2025
0eea17e
Use dual stream for DS MoE whenever cuda graph is used (instead of wi…
trevor-m Nov 22, 2025
a92afb0
[Ascend] support Kimi-K2-Thinking (#12759)
zhuyijie88 Nov 22, 2025
ac43822
Refactor eagle bigram key matching (#13714)
hnyls2002 Nov 22, 2025
a22de64
[diffusion] fix: fix hunyuanvideo and add 2-gpu ci test (#13720)
yhyang201 Nov 22, 2025
3e804bb
Update mem checker during busy (#13704)
hnyls2002 Nov 22, 2025
3397bce
Tiny support different prompts in `send_one.py` (#13768)
hnyls2002 Nov 22, 2025
ca548d8
[diffusion] refactor: refactor sampling params (#13706)
mickqian Nov 22, 2025
5625e32
[VLM] Replace torch.repeat_interleave with faster np.repeat for Qwen-…
yuan-luo Nov 22, 2025
8631246
[Spec v2] Remove `allocate_lens` and enable over-allocation (#13478)
hnyls2002 Nov 22, 2025
dd30361
[diffusion] CI: tinyfix diffusion ci (#13769)
yhyang201 Nov 22, 2025
5a4394a
align code style eagle draft&draft_extend cuda graph runner (#13533)
cicirori Nov 22, 2025
3990b84
Refactor MHA & MLA KV caches to support FP4 (#13547)
JackChuang Nov 22, 2025
b29769f
Move unnecessary input_addr capture under debug mode flag for speed-u…
byjiang1996 Nov 22, 2025
cad7878
Gather static input buffers for cuda graph (#13676)
cctry Nov 22, 2025
0479350
Revert "Fix RMSNorm API CALL mismatch issue. (#10032)" (#13727)
ErsongWang Nov 22, 2025
e014867
[model-gateway] update smg code owner (#13777)
slin1237 Nov 22, 2025
5354d7b
[model-gateway] clean up router manager function order (#13776)
slin1237 Nov 23, 2025
a90435c
Fix typo in docs (#13709)
yinpeiqi Nov 23, 2025
ac5505b
[Feature] HiCache JIT kernel (once again) (#13764)
DarkSharpness Nov 23, 2025
b964ce6
[DeepEP] Add SGLANG_DEEPEP_BF16_DISPATCH env var in Normal mode (#13787)
BBuf Nov 23, 2025
53fffef
Upgrade flashmla kernel for NSA tp support (#13718)
YAMY1234 Nov 23, 2025
d459396
[diffusion] feat: support sp for image models (#13180)
mickqian Nov 23, 2025
dd70cf9
[diffusion] CI: add run_suite to multimodal_gen CI (#13791)
mickqian Nov 23, 2025
aaa40a9
Fix pagination bug in CI monitor preventing performance-test-2-gpu da…
alisonshao Nov 23, 2025
5c29154
[Scheduler] Tiny organize code style (#13806)
hnyls2002 Nov 23, 2025
618ca23
[Deepseek] Refactor deepseek server_args _handle_model_specific_adjus…
hlu1 Nov 23, 2025
c9bd1ac
[CI] Tiny refactoring sgl-kernel tests (#13813)
Fridge003 Nov 23, 2025
2892265
Tune fp8_w8a8 fused triton moe for GLM-4.6-FP8 (#13815)
Qiaolin-Yu Nov 23, 2025
18403f6
make trtllm attn backend's init_forward_metadat non blocking (#13802)
cicirori Nov 23, 2025
9054e84
remove package json which is not used (#13810)
slin1237 Nov 23, 2025
4683e24
[1/2] Refactor DeepGeem requant for FP8 Linear on Blackwell (#13601)
Fridge003 Nov 24, 2025
a22104a
chore: bump sgl-kernel version to 0.3.18 (#13816)
sglang-bot Nov 24, 2025
d5e0346
xgrammar up version to 0.1.27 (#13650)
Swipe4057 Nov 24, 2025
dbf2215
Fix bug: Incorrect variable used in rem_total_token_offset calculatio…
liuhuijiayou Nov 24, 2025
9ea1953
[Doc] Refine fused_moe_triton configs doc (#13820)
BBuf Nov 24, 2025
75222bf
Update MindSpore documentation (#13656)
wangtiance Nov 24, 2025
b2f7b08
Refactor cache init logic (#13800)
hnyls2002 Nov 24, 2025
f56b9b4
[Bugfix] Add jit kernel files in packaging (#13829)
yuan-luo Nov 24, 2025
414248e
[diffusion] doc: minor update contributing.md with test section (#13792)
mickqian Nov 24, 2025
981ca83
[misc] Rename minilb install env & remove files & fix lint (#13831)
hnyls2002 Nov 24, 2025
e5c0f59
[diffusion] CI: send nightly-test outputs of diffusion to slack for c…
yhyang201 Nov 24, 2025
04b52fa
[chore]Upgrade flashinfer to 0.5.3 (#13751)
Fridge003 Nov 24, 2025
aeac622
[Intel XPU]support xgrammar backend for intel xpu (#13245)
gaopengff Nov 24, 2025
ecefc79
[sgl-kernel Code Clean] Remove useless lightning_attention kernel (#1…
BBuf Nov 24, 2025
8ef1156
[VLM] Revise InternVL Piecewise CUDA Graph Supporting (#13846)
yuan-luo Nov 24, 2025
1dd9a6a
Fix TorchAO quant in VLM (#13508)
zhooooong Nov 24, 2025
a146f83
[Fix]: Adjust FutureMap's token_id_bufs Size to Prevent ChunkedPrefil…
ant-yy Nov 24, 2025
98b38de
Fix: Safe RoPE Cache Expansion to Prevent Position-ID Out-of-Bounds i…
YAMY1234 Nov 24, 2025
a95a380
[Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (#13612)
tom-jerr Nov 24, 2025
b60e769
Tiny unpin uvloop for other backends (#13858)
hnyls2002 Nov 24, 2025
a3b578f
[model-gateway] Refactor router e2e responses tests (#13745)
Nov 24, 2025
9535015
[Perf] Optimize DeepSeek-R1 w4afp8 glue kernels (#10027)
yuhyao Nov 24, 2025
94216a9
Fix quantized moe checker fail for Qwen3 dense fp8 model (#13853)
fzyzcjy Nov 24, 2025
9b4b344
[model-gateway] add grpc server code owner (#13865)
slin1237 Nov 24, 2025
fafaa2c
[BugFix] fix outplace_fused_experts missing is_gated (#13864)
zminglei Nov 24, 2025
9dc15d8
fix xgrammar_backend crash with malformed inputs (#13752)
gongwei-130 Nov 24, 2025
e83bd1f
[Auto Sync] Update schedule_batch.py, schedule_policy.py, b... (20251…
merrymercy Nov 24, 2025
bf10869
[Doc] Add an Introduction to Expert Parallelism (#13783)
ch-wan Nov 24, 2025
eb1d885
add LoRA warning if loading a preexisting LoRA adapter with a differe…
glenliu21 Nov 24, 2025
db0ffc0
[NPU] Fix NPU CI (#13834)
iforgetmyname Nov 25, 2025
4b45d55
Overlap glm moe gemms in two cuda streams (#13786)
Qiaolin-Yu Nov 25, 2025
de430b6
[Performance] Replace preprocess_video logic from GLM multimodal pro…
byjiang1996 Nov 25, 2025
b0a26ba
Add support for bf16 x bf16 cutlass fused MoE (#10275)
nvcastet Nov 25, 2025
a164259
[Router bugfix] Fix router_manager selecting the wrong router when en…
SYChen123 Nov 25, 2025
173e73f
Fix nightly test job to fail when any test fails (#13871)
alisonshao Nov 25, 2025
9384fa2
[diffusion] refactor: remove training-related code (#13860)
mickqian Nov 25, 2025
a2c388b
[CI] fix multimodel-gen-test job (#13874)
cyb70289 Nov 25, 2025
83e7207
[diffusion] CI: add validation and cleanup for corrupted safetensors …
alisonshao Nov 25, 2025
da182e4
[CI] fix lint error (#13891)
cyb70289 Nov 25, 2025
8ff3ef1
fix: draft model revision misuse model revision (#11893)
gongwei-130 Nov 25, 2025
f9fe063
Fix trace publish paths in nightly-test-nvidia workflow (#13888)
alisonshao Nov 25, 2025
ed8786b
Adding nightly tests for Kimi-K2-thinking, Qwen3, minimax-m2, GLM4.6 …
dougyster Nov 25, 2025
c1dd9a9
[Fix] JIT kernel dependencies in other platforms (#13889)
DarkSharpness Nov 25, 2025
cce2d74
remove RoPE CPU fp32 tests (#13827)
ZailiWang Nov 25, 2025
7cc43bd
Move test_dummy_grok_models.py from manual to srt (temporary) (#13901)
alisonshao Nov 25, 2025
407cb3c
[CI tiny fix] Enhance robustness of vision chunked prefill test with …
BBuf Nov 25, 2025
760c20b
update flashinfer_cubin==0.5.3 (#13848)
Lzhang-hub Nov 25, 2025
ba8af6f
Merge remote-tracking branch 'upstream/main' into sparse_attn_fa3
Nov 25, 2025
17232a1
fix
Nov 25, 2025
51ced49
fix
Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
788 changes: 788 additions & 0 deletions .github/CI_PERMISSIONS.json

Large diffs are not rendered by default.

59 changes: 41 additions & 18 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,21 +1,44 @@
.github @merrymercy @zhyncs
/docker @zhyncs @HaiShaw @ByronHsu
/python/pyproject.toml @merrymercy @zhyncs
/python/sglang/* @merrymercy @Ying1123 @zhyncs @hnyls2002
/python/sglang/srt/constrained @hnyls2002
/python/sglang/srt/disaggregation @ByronHsu @hnyls2002
/python/sglang/srt/disaggregation/mooncake @ShangmingCai
/python/sglang/srt/distributed @yizhang2077 @merrymercy
/python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy
/python/sglang/srt/eplb @fzyzcjy
/python/sglang/srt/function_call @CatherineSue
/python/sglang/srt/layers @merrymercy @Ying1123 @zhyncs @ispobock @HaiShaw @ch-wan @BBuf @kushanam @Edwardf0t1
.github @merrymercy @Fridge003 @ispobock @Kangyan-Zhou
/docker @Fridge003 @ispobock @HaiShaw @ishandhanani
/docker/npu.Dockerfile @ping1jing2 @iforgetmyname
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
/python/sglang/multimodal_gen @mickqian
/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
/python/sglang/srt/disaggregation @ByronHsu @hnyls2002 @ShangmingCai
/python/sglang/srt/disaggregation/ascend @ping1jing2 @iforgetmyname
/python/sglang/srt/distributed @yizhang2077 @merrymercy @ch-wan
/python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy @JustinTong0323
/python/sglang/srt/entrypoints/grpc_server.py @CatherineSue @slin1237
/python/sglang/srt/eplb @fzyzcjy @ch-wan
/python/sglang/srt/function_call @CatherineSue @JustinTong0323
/python/sglang/srt/grpc @CatherineSue @slin1237
/python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @kushanam @Edwardf0t1
/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg @AniZpZ
/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/lora @Ying1123 @Fridge003 @lifuhuang
/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @zhyncs
/python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
/python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @zhyncs @ispobock
/python/sglang/srt/multimodal @mickqian @JustinTong0323
/python/sglang/srt/speculative @Ying1123 @merrymercy @rkooo567 @kssteven418
/sgl-kernel @zhyncs @ispobock @HandH1998 @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
/sgl-router @slin1237 @ByronHsu
/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @Fridge003 @ispobock
/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2 @iforgetmyname
/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201
/python/sglang/srt/speculative @Ying1123 @merrymercy @hnyls2002
/sgl-kernel @zhyncs @ispobock @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
/sgl-router @slin1237 @CatherineSue
/sgl-router/benches @slin1237
/sgl-router/bindings/python @CatherineSue @key4ng @slin1237
/sgl-router/py_test @CatherineSue @key4ng
/sgl-router/src/config @slin1237
/sgl-router/src/core @slin1237
/sgl-router/src/data_connector @key4ng
/sgl-router/src/grpc_client @CatherineSue @slin1237
/sgl-router/src/mcp @key4ng @slin1237
/sgl-router/src/policies @slin1237 @ByronHsu
/sgl-router/src/proto @CatherineSue @slin1237
/sgl-router/src/protocols @CatherineSue @key4ng
/sgl-router/src/reasoning_parser @CatherineSue
/sgl-router/src/routers @CatherineSue @key4ng @slin1237
/sgl-router/src/tokenizer @slin1237 @CatherineSue
/sgl-router/src/tool_parser @slin1237 @CatherineSue
/test/srt/ascend @ping1jing2 @iforgetmyname
/test/srt/test_modelopt* @Edwardf0t1
12 changes: 12 additions & 0 deletions .github/FOLDER_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Maintenance Tools

This folder contains tools and workflows for automating maintenance tasks.

## CI Permissions

`CI_PERMISSIONS.json` defines the CI permissions granted to each user.
Maintainers can directly edit the file to add entries with `"reason": "custom override"`.
Maintainers can also run `update_ci_permission.py` to update it with some auto rules (e.g., top contributors in the last 90 days get full permissions).

## Others
- `MAINTAINER.md` defines the code maintenance model.
25 changes: 11 additions & 14 deletions .github/ISSUE_TEMPLATE/1-bug-report.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 🐞 Bug report
description: Create a report to help us reproduce and fix the bug
description: Report a bug to help us reproduce and fix it.
title: "[Bug] "
labels: ['Bug']

Expand All @@ -8,31 +8,28 @@ body:
attributes:
label: Checklist
options:
- label: 1. I have searched related issues but cannot get the expected help.
- label: 2. The bug has not been fixed in the latest version.
- label: 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- label: 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- label: 5. Please use English, otherwise it will be closed.
- label: I searched related issues but found no solution.
- label: The bug persists in the latest version.
- label: Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- label: If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- label: Please use English. Otherwise, it will be closed.
- type: textarea
attributes:
label: Describe the bug
description: A clear and concise description of what the bug is.
description: A clear, concise description of the bug.
validations:
required: true
- type: textarea
attributes:
label: Reproduction
description: |
What command or script did you run? Which **model** are you using?
placeholder: |
A placeholder for the command.
description: Command/script run and model used.
placeholder: Paste the command here.
validations:
required: true
- type: textarea
attributes:
label: Environment
description: |
Please provide necessary environment information here with `python3 -m sglang.check_env`. Otherwise the issue will be closed.
placeholder: Environment here.
description: Run `python3 -m sglang.check_env` and paste output here. Issues without this will be closed.
placeholder: Paste environment output here.
validations:
required: true
8 changes: 4 additions & 4 deletions .github/ISSUE_TEMPLATE/2-feature-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ body:
attributes:
label: Checklist
options:
- label: 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- label: 2. Please use English, otherwise it will be closed.
- label: If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- label: Please use English. Otherwise, it will be closed.
- type: textarea
attributes:
label: Motivation
description: |
A clear and concise description of the motivation of the feature.
Clearly and concisely describe the feature's motivation.
validations:
required: true
- type: textarea
attributes:
label: Related resources
description: |
If there is an official code release or third-party implementations, please also provide the information here, which would be very helpful.
Provide official releases or third-party implementations if available.
67 changes: 67 additions & 0 deletions .github/MAINTAINER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# SGLang Code Maintenance Model
This document describes the code maintenance model for the SGLang project.
Since SGLang is a large project involving multiple organizations and hardware platforms, we designed this model with the following goals:
- Ensure a responsive and smooth review process.
- Allow for fast iteration, so maintainers can sometimes bypass flaky CI tests for important PRs.

## Role Descriptions
There are four roles in this maintenance model. Some are custom roles, while others are predefined by GitHub.

- **Merge Oncall**: The person who drives the PR merge process. They have strong area-specific expertise and uphold a high bar for code quality.
- Permission: Merge PRs. Bypass branch protection rules if needed.
- Responsibility: Shepherd the merge of PRs assigned to their area. Revert or hotfix any issues related to their merge (especially if they bypass).
- **Codeowner**: The person who protects critical code. Without a bypass, each PR needs at least one Codeowner approval for each modified file protected by [CODEOWNERS](./CODEOWNERS). Please note that this role is not an honor but a significant responsibility because PRs cannot be merged without your approval (except when bypassed by a Merge Oncall).
- Permission: Approve PRs, allowing them to be merged without a bypass.
- Responsibility: Review PRs in a timely manner.
- **Write**: A person with write permission to the SGLang repo.
- Permission: Merge PRs if they have passed required tests and been approved by Codeowners. This role cannot bypass branch protection rules.
- Responsibility: Review and merge PRs in a timely manner.
- **CI Oncall**: A person who manages CI runners for specific hardware platforms.
- Permission: Add CI runners.
- Responsibility: Keep the CI runners up and running.

__Note__: Difference between Merge Oncall and Codeowner
- The Merge Oncall is an active role held by someone who actively tries to help merge PRs and can bypass CI if needed.
- The Codeowner is a passive protection role provided by GitHub; it prevents accidental changes to critical code.
- The list of Merge Oncalls is attached below. The list of Codeowners is in the [CODEOWNERS](./CODEOWNERS) file.

__Note__: The permissions to trigger CI tests are defined separately according to these [rules](https://docs.sglang.ai/developer_guide/contribution_guide.html#how-to-trigger-ci-tests).


## Pull Request Merge Process
1. The author submits a pull request (PR) and fills out the PR checklist.
2. A bot assigns this PR to a Merge Oncall and @-mentions them. At the same time, GitHub will automatically request reviews from Codeowners.
3. Someone tags the PR with a `run-ci` label ([help](https://docs.sglang.ai/developer_guide/contribution_guide.html#how-to-trigger-ci-tests)). Then the author can trigger CI by pushing new commits.
4. The Merge Oncall coordinates the review (e.g., asking people to review) and approves the PR; the Codeowners also approve the PR. If the assigned Merge Oncall is not responsive, the author can ping other related Merge Oncalls and Reviewers in the list below.
5. The code can now be merged:
- **Ideal case:** For each modified file, one Codeowner has approved the PR. The PR has also passed the required CI tests. Then, anyone with write permission can merge the PR.
- **Exception:** In cases where it is difficult to meet all requirements (due to flaky CI or slow responses), a Merge Oncall can bypass branch protection to merge the PR.

If you meet any issues during the merge, you can discuss in [slack channels](https://slack.sglang.ai/): #dev, #pull-request, and #ci-cd-build-release.

## The List of Merge Oncalls and Reviewers
The format is @github-username (Slack username).

TODO: fill in the list.

Now we have many Merge Oncalls mainly because the CI is flaky and the CODEOWNERS is too coarse-grained.
In the future, we hope the CI can be improved and we only need bypass rarely. After that, most Merge Oncalls can be converted back to Write and CODEOWNERS.

This list is based on the current situation. If you or someone you know would like to take on more responsibility and are qualified, please ping @Lianmin Zheng and @Ying Sheng in the Slack channel. They will start a nomination and internal review process.

## The List of CI Oncalls
The format is @github-username (Slack username).

### NVIDIA GPUs
@merrymercy (Lianmin Zheng), @Kangyan-Zhou (Kangyan Zhou), @ch-wan (Cheng Wan), @HanHan009527 (hanhan), @ishandhanani (Ishan Dhanani), @key4ng (Keyang Ru), @slin1237 (Simo Lin), @ShangmingCai (Shangming Cai)

### AMD GPUs
@saienduri (Sai Enduri), @HaiShaw (Henry HAI)

### Intel CPU and XPU
@mingfeima (Mingfei Ma), @DiweiSun (Diwei Sun)

### Ascend NPUs
@iforgetmyname (Even Zhou)

This list is based on the current situation. If you or someone you know would like to donate machines for CI, they can serve as the CI oncalls for their machines. Please ping @Lianmin Zheng and @Ying Sheng in the Slack channel. They will start a nomination and internal review process.
53 changes: 0 additions & 53 deletions .github/REVIEWERS.md

This file was deleted.

110 changes: 110 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Configuration for the GitHub Labeler action
# Automatically adds labels to PRs based on the files changed

# Router specific (Rust code in sgl-router)
model-gateway:
- changed-files:
- any-glob-to-any-file: 'sgl-router/**/*'

# Kernel specific
sgl-kernel:
- changed-files:
- any-glob-to-any-file: 'sgl-kernel/**/*'

# Documentation
documentation:
- changed-files:
- any-glob-to-any-file:
- '**/*.md'
- 'docs/**/*'
- 'README*'

# Dependencies
dependencies:
- changed-files:
- any-glob-to-any-file:
- '**/requirements*.txt'
- '**/Cargo.toml'
- '**/Cargo.lock'
- '**/pyproject*.toml'
- '**/setup.py'
- '**/poetry.lock'
- '**/package.json'
- '**/package-lock.json'

# Multi-modal
Multi-modal:
- changed-files:
- any-glob-to-any-file:
- '**/*multimodal*'
- '**/*vision*'
- '**/*vlm*'

# Diffusion
diffusion:
- changed-files:
- any-glob-to-any-file: 'python/sglang/multimodal_gen/**/*'

# LoRA
lora:
- changed-files:
- any-glob-to-any-file:
- '**/*lora*'

# Quantization
quant:
- changed-files:
- any-glob-to-any-file:
- '**/*quant*'
- '**/*quantization*'

# Speculative decoding
speculative-decoding:
- changed-files:
- any-glob-to-any-file:
- '**/*speculative*'

# AMD specific
amd:
- changed-files:
- any-glob-to-any-file:
- '**/*amd*'
- '**/*rocm*'

# NPU specific
npu:
- changed-files:
- any-glob-to-any-file:
- '**/*npu*'
- '**/*ascend*'

# Blackwell
blackwell:
- changed-files:
- any-glob-to-any-file:
- '**/*nvfp4*'
- 'sgl-kernel/csrc/attention/cutlass_sm100_mla/**/*'
- 'python/sglang/srt/layers/attention/trtllm_mla_backend.py'
- 'python/sglang/srt/layers/attention/trtllm_mha_backend.py'

# DeepSeek specific
deepseek:
- changed-files:
- any-glob-to-any-file:
- '**/*deepseek*'

# HiCache
hicache:
- changed-files:
- any-glob-to-any-file:
- '**/*hicache*'

# Deterministic
deterministic:
- changed-files:
- any-glob-to-any-file: 'python/sglang/srt/batch_invariant_ops/**/*'

# Piecewise CUDA Graph
piecewise-cuda-graph:
- changed-files:
- any-glob-to-any-file: 'python/sglang/srt/compilation/**/*'
2 changes: 2 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@
- [ ] Add unit tests according to the [Run and add unit tests](https://docs.sglang.ai/developer_guide/contribution_guide.html#run-and-add-unit-tests).
- [ ] Update documentation according to [Write documentations](https://docs.sglang.ai/developer_guide/contribution_guide.html#write-documentations).
- [ ] Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.ai/developer_guide/contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.ai/developer_guide/contribution_guide.html#benchmark-the-speed).
- [ ] Follow the SGLang code style [guidance](https://docs.sglang.ai/developer_guide/contribution_guide.html#code-style-guidance).
- [ ] Work with maintainers to merge your PR. See the [PR Merge Process](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER.md#pull-request-merge-process)
Loading
Loading