Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
705 commits
Select commit Hold shift + click to select a range
7f3308b
[model-gateway] extract conversation out of oai router (#14440)
slin1237 Dec 4, 2025
7dfcc78
[DeepseekV3.2][NSA][Indexer] Fix PAGED top-k transform for NSA indexe…
YAMY1234 Dec 4, 2025
d8faf2f
[model-gateway] move oai header util to router header util (#14441)
slin1237 Dec 4, 2025
922756a
[FIX] trtllm-moe-fp4-renorm for Qwen series models (#14350)
samuellees Dec 4, 2025
88d1bab
add doc for quantized kv cache (#14348)
b8zhong Dec 4, 2025
0e6441b
fix: Correct environment variable syntax in docker-compose configurat…
yankay Dec 4, 2025
eb85fa6
[model-gateway] move all responses api event from oai to proto (#14446)
slin1237 Dec 4, 2025
29c6c2e
[model-gateway] add mistral 3 image processor (#14445)
slin1237 Dec 4, 2025
c1006fd
[model-gateway] grpc to leverage event type (#14450)
slin1237 Dec 4, 2025
6d37e70
ministral3 (#14251)
JustinTong0323 Dec 4, 2025
2ecee75
[Bug] fix not desired disable fused share experts caused by rocm logi…
ocss884 Dec 5, 2025
b5d3998
Rename secrets.WHL_TOKEN -> secrets.GH_PAT_FOR_WHL_RELEASE (#14421)
sglang-bot Dec 5, 2025
fa0ca97
[diffusion] improve: further optimize model load (#13836)
zyksir Dec 5, 2025
532037d
Add CI permissions for user 'yushengsu-thu' (#14468)
alisonshao Dec 5, 2025
41429a8
[ez] Fix typing (#14473)
Dec 5, 2025
4c5074e
Add AMD stage support to /rerun-stage command and fix related bugs (#…
alisonshao Dec 5, 2025
80a575e
Add YAMY1234 to CI Permission (#14475)
Fridge003 Dec 5, 2025
b76e303
clean up gemlite usage (#14444)
zminglei Dec 5, 2025
beec8ee
[diffusion] chore: further improve model searching logic (#14484)
mickqian Dec 5, 2025
46b05ef
[diffusion] fix: fix bug about pin memory when offloading (#14472)
zyksir Dec 5, 2025
7c744d1
[diffusion] cli: add argument --adjust-frames and --override-protecte…
gmixiaojin Dec 5, 2025
498ea41
dockerfile: add runtime stage + ubuntu 24.04 (#13861)
ishandhanani Dec 5, 2025
35ba6fe
[diffusion] fix: fix CLIP text encoder attention mask not used (#14364)
niehen6174 Dec 5, 2025
2ce121a
Enable RadixCache for Mamba2 models (#13584)
roikoren755 Dec 5, 2025
5347732
[diffusion] fix: Fix profiler trace missing Python stack in diffusion…
BBuf Dec 5, 2025
8fce9e7
support GLM-V vision model dp (#14097)
zRzRzRzRzRzRzR Dec 5, 2025
7235a7f
[misc] add model arch and type to server info and use it for harmony …
slin1237 Dec 5, 2025
205f041
Add Mistral Large 3 Eagle Support (#14466)
elvischenv Dec 5, 2025
6628098
Add Mistral Large 3 to nightly CI tests (#14459)
alisonshao Dec 5, 2025
a890456
[diffusion] chore: set allowing overriding protected fields of sampli…
mickqian Dec 5, 2025
0528437
[model-gateway] move conversation to first class routing (#14506)
slin1237 Dec 5, 2025
889b46e
[Spec] Mamba2 support in target models (#13434)
roikoren755 Dec 5, 2025
66984a8
[diffusion] feat: support cache-dit integration (#14234)
Brain97 Dec 5, 2025
38daa29
Add fused FP8 KV cache write kernel for TRTLLM MHA backend (#14093)
harvenstar Dec 5, 2025
5a46fb1
[model-gateway] Add WASM support for middleware (#12471)
tonyluj Dec 5, 2025
1569fc7
[model-gateway] reorganized conversation handler (#14507)
slin1237 Dec 5, 2025
ec7b2c1
tiny remove deprecated endpoint call (#13607)
b8zhong Dec 5, 2025
cf9a774
[model-gateway] fix server info comment (#14508)
slin1237 Dec 5, 2025
16e8463
Add Mistral Large 3 basic test to PR CI (#14460)
alisonshao Dec 5, 2025
e73173b
Fix removing worker will make it healthy forever in prometheus metric…
fzyzcjy Dec 5, 2025
1ea6b74
[model-gateway] Make Tokenizer Builder Aware of Env Vars Like HF_ENDP…
xuwenyihust Dec 5, 2025
49dfa1d
[model-gateway] change sgl-router to sgl-model-gateway (#14312)
slin1237 Dec 5, 2025
aed835e
[model-gateway] fix left over sgl-router names to sgl-model-gateway (…
slin1237 Dec 5, 2025
09376fd
[model-gateway] fix logs in smg workflow (#14513)
slin1237 Dec 5, 2025
b72f026
[model-gateway] fix left over sgl-router names in wasm (#14514)
slin1237 Dec 5, 2025
959a174
[model-gateway] fix code owner for wasm (#14516)
slin1237 Dec 5, 2025
e11f795
chore: bump sgl-kernel version to 0.3.18.post3 (#14427)
sglang-bot Dec 5, 2025
3d1b591
Tiny use trtllm_mha as default when possible (#14291)
fzyzcjy Dec 5, 2025
e41664b
[Docs] Add /rerun-stage command to contribution guide (#14521)
alisonshao Dec 5, 2025
b988c18
Fix safetensors validation to catch corruption after download (#14465)
alisonshao Dec 6, 2025
a0dde90
[CODEOWNER] update codeowner for qwen3-next related (#14522)
hanming-lu Dec 6, 2025
2ac5b98
fix: fix rmsnorm -> layernorm in qwen3 omni (#11791)
vincentzed Dec 6, 2025
d881f31
[diffusion] chore: temporarily upgrade diffusers to make Z-image comp…
mickqian Dec 6, 2025
d30d6b3
[bug] fix notebook to include new keys from model_info (#14528)
slin1237 Dec 6, 2025
7b0c7ad
Revise DP Multi-Modal Encoder Document (#14290)
yhyang201 Dec 6, 2025
d257bf8
[CPU] add mamba fla kernels for Qwen3-next (#12324)
blzheng Dec 6, 2025
42fcf54
Revert "tiny remove deprecated endpoint call" (#14533)
Fridge003 Dec 6, 2025
ea17737
support mtp with deepseek r1 nvfp4 model (#13115)
rainj-me Dec 6, 2025
35a9a07
[diffusion] refactor: simplify sampling params' override logic (#14539)
mickqian Dec 6, 2025
6d41791
[diffusion] perf: add QKV fusion optimization for Flux models (#14505)
BBuf Dec 6, 2025
e12c6b3
[model-gateway][tracing]: implement request tracing using OpenTelemet…
sufeng-buaa Dec 6, 2025
80122e4
[diffusion] lora: fix LoRA dtype handling and weight attribute access…
niehen6174 Dec 6, 2025
3e40c63
fix "GrammarMatcher has terminated after accepting the stop token, bu…
gongwei-130 Dec 6, 2025
bc38847
[1/n] Fix hanging during DeepGemm Warmup (#14493)
Fridge003 Dec 6, 2025
cee93a6
[Bug fix] Add /model_info endpoint to mini_lb (#14535)
alisonshao Dec 6, 2025
e592ee6
[Qwen3-next] remove heuristics and add radix cache kl test (#14520)
hanming-lu Dec 6, 2025
9dfa01a
[Misc]Register and refactor some environs for dpsk-fp4 and DeepEp (#1…
Fridge003 Dec 6, 2025
d2b4247
chore: bump sgl-kernel version to 0.3.18.post3 (#14518)
sglang-bot Dec 6, 2025
5edbe35
Update CI_PERMISSIONS.json (#14552)
harrisonlimh Dec 6, 2025
5f6f550
Update DeepSeek V3 docs to use B200 (#14447)
leejnau Dec 7, 2025
dd91d38
[Doc] Add short explanation on page size (#14557)
b8zhong Dec 7, 2025
ff6e3ea
[docs] Add missing word in argument description (#14205)
almaslof Dec 7, 2025
be4a3ec
support piecewise cuda graph for Olmo models (#14476)
zminglei Dec 7, 2025
32a32cf
Enhance prefill PP node robustness (#14494)
qhsc Dec 7, 2025
91c9c14
DOC update nemo-skills in docs (#14555)
gwarmstrong Dec 7, 2025
6d5d76a
remove unecessary dual stream token threshold from the rest of models…
b8zhong Dec 7, 2025
0e4d879
feat(ci): add framework target to release-docker workflows (#14559)
ishandhanani Dec 7, 2025
3c7886e
Fix attention backend logic for Qwen3-Next on SM100 (#14560)
Chen-0210 Dec 7, 2025
41d61fa
[FLA] Add explicit kernel arguments to kda.py for Kimi Linear support…
alisonshao Dec 7, 2025
e5135b7
Add CUDA kernel size analysis tool for sgl-kernel optimization (#14544)
BBuf Dec 7, 2025
9abcab3
[DLLM] feat: Add threshold based parallel decoding support (#14412)
btw616 Dec 7, 2025
f2b5dcc
Add unit-test-backend-8-gpu-b200 to rerun-stage command (#14569)
alisonshao Dec 7, 2025
26d9500
[apply][2/2] Fused qk_norm_rope for Qwen3-MoE (#13998)
yuan-luo Dec 7, 2025
ae6a663
Add Expert Parallelism (EP) support for kimi-k2-thinking (#13725)
BBuf Dec 7, 2025
88c459c
Tiny remove wrong import from `python.sglang` (#14577)
hnyls2002 Dec 7, 2025
125e17e
Add small model test for spec v2 + dp + trtllm_mla (#14576)
hnyls2002 Dec 7, 2025
c8683ae
[diffusion] cli: profiling utilities support (#14185)
AichenF Dec 7, 2025
f124539
[NPU]LoRA: Adding Torch Native backend (#14132)
vlserov Dec 7, 2025
948b6ac
[BugFix] fix prefixcache performance and accuracy on ascend (#13573)
khalil2ji3mp6 Dec 7, 2025
84efe54
Fix FP8 KV Triton type issue and add regression test (#14553)
harvenstar Dec 7, 2025
f6423b6
Rename TensorRT Model Optimizer to Model Optimizer (#14455)
Edwardf0t1 Dec 7, 2025
3b47973
[CI] Tiny speed up VLM CI (#14517)
b8zhong Dec 7, 2025
673c11b
[Minor] Temporarily skipping deepep large mtp test (#14586)
Fridge003 Dec 7, 2025
b0bbc7f
[model-gateway] extra accumulator and tool handler in oai router (#14…
slin1237 Dec 7, 2025
5e2cda6
[model-gateway] Fixed WASM Security Vulnerability - Execution Timeout…
slin1237 Dec 8, 2025
aff1238
[model-gateway] reorganize metrics, logging, and otel to its own modu…
slin1237 Dec 8, 2025
03b835e
Refactor tuning block wise kernel and opt Qwen/Qwen3-VL-32B-Instruct-…
BBuf Dec 8, 2025
6799847
[CI]Unblock and split spec v2+dp test (#14551)
Fridge003 Dec 8, 2025
b7b7524
[Tool Call] Fix DeepSeekV32Detector skipping functions with no params…
momaek Dec 8, 2025
f57d4fe
[feat] use cachebuffer to store mm feature to speedup hash (#14386)
liusy58 Dec 8, 2025
559202b
[CI] Fix unit-test-backend-8-gpu-b200 running on every /rerun-stage (…
alisonshao Dec 8, 2025
a4ffd66
[model-gateway] fix WASM memory limit per module (#14600)
slin1237 Dec 8, 2025
85d0ccf
Tiny fix missing policy decision recording (#14605)
fzyzcjy Dec 8, 2025
1915a1f
Super tiny remove unneeded policy flag (#14608)
fzyzcjy Dec 8, 2025
8fbf7dd
[model-gateway] refactor otel to be more efficient (#14604)
slin1237 Dec 8, 2025
c08b780
Super tiny remove unused select_worker_pair (#14609)
fzyzcjy Dec 8, 2025
2970f22
[model-gateway] fix WASM unbounded request/response body read vuln (#…
slin1237 Dec 8, 2025
661e977
[2/2] Add rope kernel in sgl-kernel (#14452)
Qiaolin-Yu Dec 8, 2025
36361ad
[DLLM] Add initial cuda graph support (#14203)
btw616 Dec 8, 2025
a2ca9bd
Super tiny fix unused code in router (#14618)
fzyzcjy Dec 8, 2025
cf0478d
[Glm46v] Bug fix for accuracy drop and unable to launch server (#14585)
byjiang1996 Dec 8, 2025
aeff0d3
Fix amd rope definition (#14556)
Qiaolin-Yu Dec 8, 2025
f72a770
modify the sgl-kernel to be compatible with transformers 5.x. (#14625)
yhyang201 Dec 8, 2025
06836ad
[Reasoning + Structured Output] make reasoning compatible with struct…
Muqi1029 Dec 8, 2025
12a08ef
[diffusion] feat: add support for LoRA layers in transformer_2 within…
Prozac614 Dec 8, 2025
4a62a0e
chore: bump sgl-kernel version to 0.3.19 (#14632)
sglang-bot Dec 8, 2025
7871593
[cpu] Implement all gather/reduce for arm64 cpu (#12527)
cyb70289 Dec 8, 2025
80cfca5
[diffusion] chore: further refine output resolution adjustment logic …
mickqian Dec 8, 2025
cb4cdb4
Fix dp-aware incompatible with service-discovery (#14629)
fzyzcjy Dec 8, 2025
8200fb5
update transformers package version to 5.0.0rc0 (#14356)
yhyang201 Dec 8, 2025
2de9801
chore: bump sgl-kernel version to 0.3.19 (#14649)
sglang-bot Dec 8, 2025
9a327bd
chore: bump SGLang version to 0.5.6.post1 (#14651)
sglang-bot Dec 8, 2025
763888b
[AMD] change fused rms quant interface for aiter upgrade (#14497)
yctseng0211 Dec 8, 2025
d69ecc1
[model-gateway] reducing cpu overhead in various of places (#14658)
slin1237 Dec 8, 2025
39f9a9c
[model-gateway] reduce cpu overhead in grpc router (#14663)
slin1237 Dec 8, 2025
7bf16c6
[model-gateway] fix WASM arbitrary file read security vol (#14664)
slin1237 Dec 8, 2025
8810152
vlm: Use fa3 as the default backend for qwen3 vl (#14634)
mickqian Dec 8, 2025
8550822
[model-gateway] Optimize memory usage in HTTP router (#14667)
slin1237 Dec 8, 2025
b9bef31
fix: use .get() when accessing strict mem-check env variable (#14657)
yhyang201 Dec 8, 2025
32f8b60
improve default glm mtp setting (#14457)
b8zhong Dec 8, 2025
2e3946d
Fix cache-aware router should pick min load instead of min tenant siz…
fzyzcjy Dec 8, 2025
6abb805
Bump up diffusers to latest official release version (#14670)
byjiang1996 Dec 8, 2025
edde5e5
[model-gateway] add OTEL integration to grpc router (#14671)
slin1237 Dec 8, 2025
93043f7
[CI] Increase max-parallel to 15 for high priority PRs (#14675)
alisonshao Dec 8, 2025
07404d7
[HiCache] fix condition check when use decode offload (#14489)
ssssnow Dec 8, 2025
eac5b66
[RadixTree] Optimize the Time Complexity of Node Retrieval Operation …
CLFutureX Dec 8, 2025
119fd95
Tiny support printing requests in bench_serving for observability (#1…
fzyzcjy Dec 9, 2025
c106b54
Aiter fp8 kv cache (#13147)
kkHuang-amd Dec 9, 2025
6f65707
[SMG]feat: implement TokenGuardBody for managing token return (#14653)
jimmy-evo Dec 9, 2025
60d36e7
[NPU] chore: bump basic software version to 8.3.rc2 (#14614)
iforgetmyname Dec 9, 2025
e5201bd
[CI] Unblock gb200 cutedsl test (#14469)
Fridge003 Dec 9, 2025
ef3f8c9
Add ffmpeg into sglang docker - required by transformers multimodal V…
byjiang1996 Dec 9, 2025
08da4c2
[Bugfix] Fix KeyError for Mistral-Large-3 rope_scaling config (#14627)
alisonshao Dec 9, 2025
af20657
Tiny support sgl-router http response status code metrics (#14689)
fzyzcjy Dec 9, 2025
e6f0ddd
[CI] Migrate Eagle 1-GPU tests to test/registered/ (#14529)
alisonshao Dec 9, 2025
0e0b0c0
Revert "[Bug] fix not desired disable fused share experts caused by r…
zhyncs Dec 9, 2025
ce4e836
Add per-request decode tp size (#14678)
merrymercy Dec 9, 2025
af60cad
[ci][smg] fix docker release ci and add it to pr test (#14683)
slin1237 Dec 9, 2025
817daba
Tiny extract select_worker_min_load (#14648)
fzyzcjy Dec 9, 2025
da3dc49
Fix dp-aware incompatible with completions and chat completions APIs …
fzyzcjy Dec 9, 2025
0f8bd55
[CI] Fix Llama 3.1 8B FP4 CI (#14699)
b8zhong Dec 9, 2025
b626334
fix: make override DeepseekV2Model work (#14707)
zhyncs Dec 9, 2025
66772aa
chore: add code owners for deepseek_v2.py (#14714)
zhyncs Dec 9, 2025
9a426fc
[CI] Move mistral large 3 basic to nightly (#14622)
alisonshao Dec 9, 2025
f0e948a
fix the deepep 8 gpu unit test (#14601)
rainj-me Dec 9, 2025
53d1708
Add fuse_marlin_moe test to ci and add new ep test (#14686)
BBuf Dec 9, 2025
cef5ba6
[Bugfix] Fix environ error in scheduler_runtime_checker_mixin.py (#14…
llfl Dec 9, 2025
fe7f91e
[Feat] Add received_time in serving_base (#13432)
zhanghaotong Dec 9, 2025
98c430e
fix: prevent HugginqFace access when SGLANG_USE_MODELSCOPE is enabled…
yrk111222 Dec 9, 2025
13680e5
[Test] Skip STANDALONE speculative decoding tests for different hidde…
alisonshao Dec 9, 2025
6ec7768
[diffusion] feat: support comparing batch perf (#14738)
Brain97 Dec 9, 2025
ab00487
Revert "[Feat] Add received_time in serving_base" (#14743)
merrymercy Dec 9, 2025
9496f12
[Model] Add PaddleOCR-VL Model Support (#12953)
yudian0504 Dec 9, 2025
15bc8cb
fix rope parameter initialization error caused by transformers v5.0 u…
yhyang201 Dec 9, 2025
8b98bb7
[model-gateway] optimize core modules (#14751)
slin1237 Dec 9, 2025
73df7a4
[SMG] perf: optimize tokenizer for reduced CPU and memory overhead (#…
slin1237 Dec 9, 2025
55504df
Add FP8 Blockwise GEMM Backend Flag `--fp8-gemm-backend` (#14379)
b8zhong Dec 9, 2025
8b0b6a4
fix: checking if tokenizer is in cache before downloading from HF (#1…
dougyster Dec 9, 2025
7c6fb3a
fix: making rate limit a warning instead of error (#14753)
dougyster Dec 9, 2025
036e64d
move multi-item scoring functions in tokenizer manager into a separat…
merrymercy Dec 9, 2025
18bd8e8
Improve CI by trying a warmup before unit tests (#14669)
merrymercy Dec 9, 2025
9ad02b7
[Perf] Optimize radix tree for cache-aware load balancin (#14758)
slin1237 Dec 9, 2025
0c63fb9
[Feature] Add LoRA support for embedding layers (#14177)
yushengsu-thu Dec 9, 2025
390406c
[model-gateway] release gateway 0.2.4 (#14763)
slin1237 Dec 10, 2025
a6dc7d2
[ci]: Enable the new hf API (#14687)
MingxuZh Dec 10, 2025
cbc7dcd
Re-add the API serving timing metrics. (#14744)
hnyls2002 Dec 10, 2025
c8d74fe
fix: adding rate limit warning at verify token permission stage (#14756)
dougyster Dec 10, 2025
5e8f544
Disable 8-gpu-b200 runner in PR tests (#14768)
alisonshao Dec 10, 2025
f077436
[fix] Fix issues for in-flight weight updates (#14064)
ShawnY112358 Dec 10, 2025
4285e99
[Auto Sync] Update data_parallel_controller.py, detokenizer... (20251…
merrymercy Dec 10, 2025
0183599
fix: race condition between validation and download locks (#14761)
alisonshao Dec 10, 2025
b0f531a
Fix VLM accuracy thresholds for nightly tests (#14777)
alisonshao Dec 10, 2025
b1cbfce
fix server args bug (#14725)
TomerBN-Nvidia Dec 10, 2025
793c98a
handling incomplete rope_scaling config ci after transformers upgrade…
yhyang201 Dec 10, 2025
b0a25d0
fix b200 ci (#14786)
b8zhong Dec 10, 2025
21028b5
[RL] support weight reload for low-bit rollout (#9650)
AniZpZ Dec 10, 2025
6c9c8da
fix: add missing logic for SGLANG_USE_MODELSCOPE variable (#14794)
yrk111222 Dec 10, 2025
56e5c07
fix b200 fa4 ci (#14788)
b8zhong Dec 10, 2025
87dbddd
[diffusion] profile: early exit when enough steps are captured to red…
mickqian Dec 10, 2025
03836d8
[GLM-4.6V] Support Pipeline Parallelism for GLM-4.6V & GLM-4.1V (#14720)
yuan-luo Dec 10, 2025
908c718
[diffusion] CI: Add LoRA support to diffusion server configuration an…
Prozac614 Dec 10, 2025
02f1e81
Revert "fix: checking if tokenizer is in cache before downloading fro…
yhyang201 Dec 10, 2025
12b7a4f
[diffusion] performance: refactor diffusion fuse qkv and apply to qwe…
BBuf Dec 10, 2025
766476f
[SMG-GO] implement a Go SGLang Model Gateway - OpenAI Compatible API …
whybeyoung Dec 10, 2025
d7f6320
[model-gateway] Dynamically Populate Tool Call Parser Choices (#14807)
xuwenyihust Dec 10, 2025
5eccaf7
Support HTTP response status code prometheus metrics (#14710)
fzyzcjy Dec 10, 2025
6634f67
Fix router keep nonzero metrics after worker is deleted (#14819)
fzyzcjy Dec 10, 2025
d85fecb
Tiny fix incorrect worker removal command (#14822)
fzyzcjy Dec 10, 2025
b8cfa02
[NPU] bug fix for mtp and w4a8 (#14806)
liupeng374 Dec 10, 2025
503880d
[CI] fix UT success check in `test_eagle_infer_beta_dp_attention.py` …
hnyls2002 Dec 10, 2025
f732f8e
Fix CI registry scan to only check test/registered directory (#14812)
alisonshao Dec 10, 2025
2543666
[model-gateway] add anthropic message api spec (#14834)
slin1237 Dec 10, 2025
83e35a7
[diffusion] doc: fix tiny typo in multimodal_gen/README.md (#14830)
wplf Dec 10, 2025
617e9b3
[model-gateway] support customizing Prometheus duration buckets (#14716)
fzyzcjy Dec 10, 2025
3d82c0f
[model-gateway] support engine response http status statistics in rou…
fzyzcjy Dec 10, 2025
1698c23
[CI] Reduce stage-b auto-partition from 4 to 2 (#14769)
alisonshao Dec 10, 2025
5b5571a
Apply back moe_sum_reduce for fused_marlin_moe (#14829)
ispobock Dec 10, 2025
6c5ebc0
[diffusion] parallel: pad tokens for video models under sp (#14833)
mickqian Dec 10, 2025
d659873
[diffusion] CI: use unified sampling_params for CI (#14045)
mickqian Dec 10, 2025
ef1ab23
[Auto Sync] Update tool_chat_template_deepseekv31.jinja (20251210) (#…
zhyncs Dec 10, 2025
c1bd5ee
Revert transformers to 4.57.1 (#14801)
yhyang201 Dec 10, 2025
e99ee0c
[model-gateway] Fix incompatible metric comparison in` PowerOfTwo` po…
ppraneth Dec 10, 2025
0e54a69
[bugfix] qwen25-VL support lora (#14638)
SYChen123 Dec 10, 2025
da9b801
fix lora target all + csgmv backend (#14796)
b8zhong Dec 10, 2025
c032b55
[model-gateway] adds default implementations to RouterTrait in mod.rs…
slin1237 Dec 10, 2025
c97ce39
[AMD] Add model to AMD nightly test (#14442)
michaelzhang-ai Dec 10, 2025
a499287
Treat unittest SkipTest exception as pass instead of as failure (#14847)
byjiang1996 Dec 10, 2025
ccf2602
[model-gateway] code clean up on oai router (#14850)
slin1237 Dec 10, 2025
bcc5483
[model-gateway] fix import order in oai conversation (#14851)
slin1237 Dec 10, 2025
c51efb8
fix fp8 gemm nightly CI (#14844)
b8zhong Dec 10, 2025
b6523a4
fix: restrict cache validation behaviors to CI only (#14849)
alisonshao Dec 11, 2025
25e9738
Fix CUDA version handling in ci_install_deepep.sh (#14854)
merrymercy Dec 11, 2025
312df1d
Fix TestGLM41VPPAccuracy test flakiness (#14848)
byjiang1996 Dec 11, 2025
bd7824b
Minor code style fix for dllm (#14836)
hnyls2002 Dec 11, 2025
7c98533
Enable TP for Mamba-based models (#14811)
roikoren755 Dec 11, 2025
7dcad45
[CI] Temp disable gb200 test (#14865)
Fridge003 Dec 11, 2025
8642dbe
Refactor Marlin MoeRunner (#14554)
trangdough Dec 11, 2025
e54307f
[6/n] Fix `num_token_non_padded` computation in prefill (#14313)
yuchengz816-bot Dec 11, 2025
32829b1
Remove myself to test CI gate issue (#14871)
Kangyan-Zhou Dec 11, 2025
1a96e66
fix: creating blobs only once for publish trace retries (#14845)
dougyster Dec 11, 2025
624725c
Move and update MindSpore docs, make it appear on the online document…
wangtiance Dec 11, 2025
b62fe85
fix nightly vlm ci : restore original eval for requests without regex…
yhyang201 Dec 11, 2025
2856624
Only count limitations for previous runs that reaches the test stage…
Kangyan-Zhou Dec 11, 2025
8348725
[CI][BUG] fix ib setup for disaggregation hicache test (#14877)
luketong777 Dec 11, 2025
a076d75
[Fix] Remove unused import from test_disaggregation_hicache.py (#14880)
ShangmingCai Dec 11, 2025
e52cf30
fix: adding temporary bypass for nightly tests (#14876)
dougyster Dec 11, 2025
f85460f
Avoid deleting entire cache for missing shards (#14754 follow-up) (#1…
alisonshao Dec 11, 2025
a368df2
Tiny add more error info for bench_serving (#14827)
fzyzcjy Dec 11, 2025
45eeeb9
Tiny support range ratio in GSP in bench serving (#14828)
fzyzcjy Dec 11, 2025
fca8e88
[diffusion] feat: enable torch compile to eliminate GPU bubble (#13641)
AichenF Dec 11, 2025
388018a
[NPU] adapt dsv3.2 nsa prefill context parallel (#14541)
liupeng374 Dec 11, 2025
5d804a3
[diffusion] feat: support sageattn & sageattn3 backend (#14878)
mickqian Dec 11, 2025
8f980dc
dsv32 multistream opt
ZhengdQin Dec 4, 2025
8e78e2e
clean code
ZhengdQin Dec 8, 2025
a88f8a2
delete renormalize in topk
ZhengdQin Dec 8, 2025
12afaab
dsv32 use batch_matmul_transpose in MTP
ZhengdQin Dec 8, 2025
28cd3b5
modify comment
ZhengdQin Dec 9, 2025
bf4dee5
Support dynamic w8a8
ZhengdQin Dec 9, 2025
9391ed0
dsv3 support ascend_fuseep
ZhengdQin Dec 11, 2025
f6f61a2
rebase modify
ZhengdQin Dec 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
516 changes: 378 additions & 138 deletions .github/CI_PERMISSIONS.json

Large diffs are not rendered by default.

55 changes: 32 additions & 23 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -2,41 +2,50 @@
/docker @Fridge003 @ispobock @HaiShaw @ishandhanani
/docker/npu.Dockerfile @ping1jing2 @iforgetmyname
/python/pyproject.toml @merrymercy @Fridge003 @ispobock
/python/sglang/multimodal_gen @mickqian
/python/sglang/multimodal_gen @mickqian @yhyang201
/python/sglang/srt/batch_invariant_ops @Fridge003 @hebiao064
/python/sglang/srt/constrained @hnyls2002 @DarkSharpness
/python/sglang/srt/compilation @hebiao064
/python/sglang/srt/disaggregation @ByronHsu @hnyls2002 @ShangmingCai
/python/sglang/srt/disaggregation/ascend @ping1jing2 @iforgetmyname
/python/sglang/srt/distributed @yizhang2077 @merrymercy @ch-wan
/python/sglang/srt/entrypoints @ispobock @CatherineSue @slin1237 @merrymercy @JustinTong0323
/python/sglang/srt/entrypoints/grpc_server.py @CatherineSue @slin1237
/python/sglang/srt/eplb @fzyzcjy @ch-wan
/python/sglang/srt/function_call @CatherineSue @JustinTong0323
/python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @kushanam @Edwardf0t1
/python/sglang/srt/grpc @CatherineSue @slin1237
/python/sglang/srt/hardware_backend/npu @ping1jing2 @iforgetmyname
/python/sglang/srt/layers @merrymercy @Ying1123 @Fridge003 @ispobock @HaiShaw @ch-wan @BBuf @Edwardf0t1
/python/sglang/srt/layers/attention/fla @yizhang2077 @hebiao064
/python/sglang/srt/layers/attention/hybrid_linear_attn_backend.py @yizhang2077 @hebiao064 @hanming-lu
/python/sglang/srt/layers/attention/mamba @yizhang2077 @hebiao064
/python/sglang/srt/layers/quantization @ch-wan @BBuf @Edwardf0t1 @FlamingoPg @AniZpZ
/python/sglang/srt/layers/attention/ascend_backend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/lora @Ying1123 @Fridge003 @lifuhuang
/python/sglang/srt/managers @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @zhyncs
/python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann
/python/sglang/srt/mem_cache/allocator_ascend.py @ping1jing2 @iforgetmyname
/python/sglang/srt/mem_cache @merrymercy @Ying1123 @hnyls2002 @xiezhq-hermann @hanming-lu @yizhang2077
/python/sglang/srt/model_executor @merrymercy @Ying1123 @hnyls2002 @Fridge003 @ispobock
/python/sglang/srt/model_executor/npu_graph_runner.py @ping1jing2 @iforgetmyname
/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201
/python/sglang/srt/model_executor/piecewise_cuda_graph_runner.py @hebiao064
/python/sglang/srt/models/deepseek_v2.py @fzyzcjy @zhyncs @ispobock @ch-wan @merrymercy @Fridge003
/python/sglang/srt/multimodal @mickqian @JustinTong0323 @yhyang201 @yuan-luo
/python/sglang/srt/speculative @Ying1123 @merrymercy @hnyls2002
/sgl-kernel @zhyncs @ispobock @BBuf @yizhang2077 @merrymercy @FlamingoPg @HaiShaw
/sgl-router @slin1237 @ByronHsu @CatherineSue
/sgl-router/benches @slin1237
/sgl-router/py_src @CatherineSue @key4ng @slin1237
/sgl-router/py_test @CatherineSue @key4ng
/sgl-router/src/config @slin1237
/sgl-router/src/core @slin1237
/sgl-router/src/data_connector @key4ng
/sgl-router/src/grpc_client @CatherineSue @slin1237
/sgl-router/src/mcp @key4ng @slin1237
/sgl-router/src/policies @slin1237 @ByronHsu
/sgl-router/src/proto @CatherineSue @slin1237
/sgl-router/src/protocols @CatherineSue @key4ng
/sgl-router/src/reasoning_parser @CatherineSue
/sgl-router/src/routers @CatherineSue @key4ng @slin1237
/sgl-router/src/tokenizer @slin1237 @CatherineSue
/sgl-router/src/tool_parser @slin1237 @CatherineSue
/sgl-model-gateway @slin1237 @CatherineSue
/sgl-model-gateway/benches @slin1237
/sgl-model-gateway/bindings/python @CatherineSue @key4ng @slin1237
/sgl-model-gateway/py_test @CatherineSue @key4ng
/sgl-model-gateway/src/config @slin1237
/sgl-model-gateway/src/core @slin1237
/sgl-model-gateway/src/data_connector @key4ng
/sgl-model-gateway/src/grpc_client @CatherineSue @slin1237
/sgl-model-gateway/src/mcp @key4ng @slin1237
/sgl-model-gateway/src/policies @slin1237 @ByronHsu
/sgl-model-gateway/src/proto @CatherineSue @slin1237
/sgl-model-gateway/src/protocols @CatherineSue @key4ng
/sgl-model-gateway/src/reasoning_parser @CatherineSue
/sgl-model-gateway/src/routers @CatherineSue @key4ng @slin1237
/sgl-model-gateway/src/tokenizer @slin1237 @CatherineSue
/sgl-model-gateway/src/tool_parser @slin1237 @CatherineSue
/sgl-model-gateway/src/wasm @slin1237
/sgl-model-gateway/examples/wasm @slin1237
/test/srt/ascend @ping1jing2 @iforgetmyname
/test/srt/test_modelopt* @Edwardf0t1
6 changes: 3 additions & 3 deletions .github/MAINTAINER.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@ __Note__: Difference between Merge Oncall and Codeowner
- The Codeowner is a passive protection role provided by GitHub; it prevents accidental changes to critical code.
- The list of Merge Oncalls is attached below. The list of Codeowners is in the [CODEOWNERS](./CODEOWNERS) file.

__Note__: The permissions to trigger CI tests are defined separately according to these [rules](https://docs.sglang.ai/developer_guide/contribution_guide.html#how-to-trigger-ci-tests).
__Note__: The permissions to trigger CI tests are defined separately according to these [rules](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests).


## Pull Request Merge Process
1. The author submits a pull request (PR) and fills out the PR checklist.
2. A bot assigns this PR to a Merge Oncall and @-mentions them. At the same time, GitHub will automatically request reviews from Codeowners.
3. Someone tags the PR with a `run-ci` label ([help](https://docs.sglang.ai/developer_guide/contribution_guide.html#how-to-trigger-ci-tests)). Then the author can trigger CI by pushing new commits.
3. Someone tags the PR with a `run-ci` label ([help](https://docs.sglang.io/developer_guide/contribution_guide.html#how-to-trigger-ci-tests)). Then the author can trigger CI by pushing new commits.
4. The Merge Oncall coordinates the review (e.g., asking people to review) and approves the PR; the Codeowners also approve the PR. If the assigned Merge Oncall is not responsive, the author can ping other related Merge Oncalls and Reviewers in the list below.
5. The code can now be merged:
- **Ideal case:** For each modified file, one Codeowner has approved the PR. The PR has also passed the required CI tests. Then, anyone with write permission can merge the PR.
- **Exception:** In cases where it is difficult to meet all requirements (due to flaky CI or slow responses), a Merge Oncall can bypass branch protection to merge the PR.

If you meet any issues during the merge, you can discuss in [slack channels](https://slack.sglang.ai/): #dev, #pull-request, and #ci-cd-build-release.
If you meet any issues during the merge, you can discuss in [slack channels](https://slack.sglang.io/): #dev, #pull-request, and #ci-cd-build-release.

## The List of Merge Oncalls and Reviewers
The format is @github-username (Slack username).
Expand Down
35 changes: 33 additions & 2 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Configuration for the GitHub Labeler action
# Automatically adds labels to PRs based on the files changed

# Router specific (Rust code in sgl-router)
# Router specific (Rust code in sgl-model-gateway)
model-gateway:
- changed-files:
- any-glob-to-any-file: 'sgl-router/**/*'
- any-glob-to-any-file: 'sgl-model-gateway/**/*'

# Kernel specific
sgl-kernel:
Expand Down Expand Up @@ -40,6 +40,11 @@ Multi-modal:
- '**/*vision*'
- '**/*vlm*'

# Diffusion
diffusion:
- changed-files:
- any-glob-to-any-file: 'python/sglang/multimodal_gen/**/*'

# LoRA
lora:
- changed-files:
Expand All @@ -66,6 +71,22 @@ amd:
- '**/*amd*'
- '**/*rocm*'

# NPU specific
npu:
- changed-files:
- any-glob-to-any-file:
- '**/*npu*'
- '**/*ascend*'

# Blackwell
blackwell:
- changed-files:
- any-glob-to-any-file:
- '**/*nvfp4*'
- 'sgl-kernel/csrc/attention/cutlass_sm100_mla/**/*'
- 'python/sglang/srt/layers/attention/trtllm_mla_backend.py'
- 'python/sglang/srt/layers/attention/trtllm_mha_backend.py'

# DeepSeek specific
deepseek:
- changed-files:
Expand All @@ -77,3 +98,13 @@ hicache:
- changed-files:
- any-glob-to-any-file:
- '**/*hicache*'

# Deterministic
deterministic:
- changed-files:
- any-glob-to-any-file: 'python/sglang/srt/batch_invariant_ops/**/*'

# Piecewise CUDA Graph
piecewise-cuda-graph:
- changed-files:
- any-glob-to-any-file: 'python/sglang/srt/compilation/**/*'
12 changes: 6 additions & 6 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<!-- Thank you for your contribution! Please follow these guidelines to enhance your pull request. If anything is unclear, submit your PR and reach out to maintainers for assistance. Join our Slack community at https://slack.sglang.ai to discuss further. -->
<!-- Thank you for your contribution! Please follow these guidelines to enhance your pull request. If anything is unclear, submit your PR and reach out to maintainers for assistance. Join our Slack community at https://slack.sglang.io to discuss further. -->

## Motivation

Expand All @@ -18,9 +18,9 @@

## Checklist

- [ ] Format your code according to the [Format code with pre-commit](https://docs.sglang.ai/developer_guide/contribution_guide.html#format-code-with-pre-commit).
- [ ] Add unit tests according to the [Run and add unit tests](https://docs.sglang.ai/developer_guide/contribution_guide.html#run-and-add-unit-tests).
- [ ] Update documentation according to [Write documentations](https://docs.sglang.ai/developer_guide/contribution_guide.html#write-documentations).
- [ ] Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.ai/developer_guide/contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.ai/developer_guide/contribution_guide.html#benchmark-the-speed).
- [ ] Follow the SGLang code style [guidance](https://docs.sglang.ai/developer_guide/contribution_guide.html#code-style-guidance).
- [ ] Format your code according to the [Format code with pre-commit](https://docs.sglang.io/developer_guide/contribution_guide.html#format-code-with-pre-commit).
- [ ] Add unit tests according to the [Run and add unit tests](https://docs.sglang.io/developer_guide/contribution_guide.html#run-and-add-unit-tests).
- [ ] Update documentation according to [Write documentations](https://docs.sglang.io/developer_guide/contribution_guide.html#write-documentations).
- [ ] Provide accuracy and speed benchmark results according to [Test the accuracy](https://docs.sglang.io/developer_guide/contribution_guide.html#test-the-accuracy) and [Benchmark the speed](https://docs.sglang.io/developer_guide/contribution_guide.html#benchmark-the-speed).
- [ ] Follow the SGLang code style [guidance](https://docs.sglang.io/developer_guide/contribution_guide.html#code-style-guidance).
- [ ] Work with maintainers to merge your PR. See the [PR Merge Process](https://github.com/sgl-project/sglang/blob/main/.github/MAINTAINER.md#pull-request-merge-process)
10 changes: 10 additions & 0 deletions .github/workflows/auto-tune.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
name: Auto tune

on:
workflow_dispatch:

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
13 changes: 13 additions & 0 deletions .github/workflows/bot-bump-kernel-version-to-sglang.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ permissions:
jobs:
bump-kernel-version-to-sglang:
runs-on: ubuntu-latest
outputs:
branch_name: ${{ steps.set_output.outputs.branch_name }}
needs_sync: ${{ steps.check_sync.outputs.needs_sync }}
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand All @@ -32,6 +35,7 @@ jobs:

- name: Configure Git and branch
if: steps.check_sync.outputs.needs_sync == 'true'
id: set_output
run: |
git config user.name "sglang-bot"
git config user.email "sglang-bot@users.noreply.github.com"
Expand All @@ -41,6 +45,7 @@ jobs:
git checkout -b "$BRANCH_NAME"
echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
echo "KERNEL_VERSION=$KERNEL_VERSION" >> $GITHUB_ENV
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT

- name: Run kernel version bump script
if: steps.check_sync.outputs.needs_sync == 'true'
Expand All @@ -53,3 +58,11 @@ jobs:
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
run: |
bash scripts/release/commit_and_pr_kernel_to_sglang.sh "$KERNEL_VERSION" "$BRANCH_NAME"

run-nightly-tests:
needs: bump-kernel-version-to-sglang
if: needs.bump-kernel-version-to-sglang.outputs.needs_sync == 'true'
uses: ./.github/workflows/nightly-test-nvidia.yml
with:
ref: ${{ needs.bump-kernel-version-to-sglang.outputs.branch_name }}
secrets: inherit
11 changes: 11 additions & 0 deletions .github/workflows/bot-bump-sglang-version.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ permissions:
jobs:
bump-sglang-version:
runs-on: ubuntu-latest
outputs:
branch_name: ${{ steps.set_output.outputs.branch_name }}
steps:
- name: Checkout code
uses: actions/checkout@v4
Expand All @@ -31,13 +33,15 @@ jobs:
pip install tomli

- name: Configure Git and branch
id: set_output
run: |
git config user.name "sglang-bot"
git config user.email "sglang-bot@users.noreply.github.com"
RANDOM_SUFFIX=$(echo $RANDOM | md5sum | head -c 4)
BRANCH_NAME="bot/bump-sglang-version-${{ github.event.inputs.new_version }}-${RANDOM_SUFFIX}"
git checkout -b "$BRANCH_NAME"
echo "BRANCH_NAME=$BRANCH_NAME" >> $GITHUB_ENV
echo "branch_name=$BRANCH_NAME" >> $GITHUB_OUTPUT

- name: Run SGLang version bump script
run: |
Expand All @@ -48,3 +52,10 @@ jobs:
GH_TOKEN: ${{ secrets.GH_PAT_FOR_PULL_REQUEST }}
run: |
bash scripts/release/commit_and_pr.sh "SGLang" "${{ github.event.inputs.new_version }}" "$BRANCH_NAME"

run-nightly-tests:
needs: bump-sglang-version
uses: ./.github/workflows/nightly-test-nvidia.yml
with:
ref: ${{ needs.bump-sglang-version.outputs.branch_name }}
secrets: inherit
10 changes: 7 additions & 3 deletions .github/workflows/cancel-all-pending-pr-test-runs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,18 @@ jobs:

for workflow_file in "${WORKFLOW_FILES[@]}"; do
echo "--- Checking workflow: $workflow_file ---"

# Fetch list and pipe to while loop
gh run list \
--repo "$REPO" \
--workflow "$workflow_file" \
--json databaseId,status \
--limit 1000 \
| jq -r '.[] | select(.status=="queued" or .status=="in_progress") | .databaseId' \
| jq -r '.[] | select(.status=="queued" or .status=="in_progress" or .status=="waiting") | .databaseId' \
| while read run_id; do
echo "Cancelling run ID: $run_id for workflow: $workflow_file"
gh run cancel "$run_id" --repo "$REPO"
echo "Attempting to cancel run ID: $run_id for workflow: $workflow_file"

# The "|| echo ..." part prevents the script from crashing if cancellation fails
gh run cancel "$run_id" --repo "$REPO" || echo "⚠️ Could not cancel run $run_id (it may have already completed). Continuing..."
done
done
4 changes: 2 additions & 2 deletions .github/workflows/ci-failure-monitor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:
limit:
description: 'Number of workflow runs to analyze (across all workflows)'
required: false
default: '800'
default: '1000'
type: string
threshold:
description: 'Alert threshold for consecutive failures'
Expand Down Expand Up @@ -51,7 +51,7 @@ jobs:
cd scripts/ci_monitor
python ci_failures_analysis.py \
--token $GITHUB_TOKEN \
--limit ${{ inputs.limit || '800' }} \
--limit ${{ inputs.limit || '1000' }} \
--threshold ${{ inputs.threshold || '4' }} \
--output ci_failure_analysis_$(date +%Y%m%d_%H%M%S).json

Expand Down
10 changes: 10 additions & 0 deletions .github/workflows/ci-monitor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ jobs:
cd scripts/ci_monitor
python ci_analyzer.py --token $GITHUB_TOKEN --limit ${{ inputs.limit || '1000' }} --output ci_analysis_$(date +%Y%m%d_%H%M%S).json

- name: Run Nightly Test Analysis
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
PYTHONUNBUFFERED: 1
PYTHONIOENCODING: utf-8
run: |
cd scripts/ci_monitor
python ci_analyzer.py --token $GITHUB_TOKEN --mode nightly --days 2 --output nightly_analysis_$(date +%Y%m%d_%H%M%S).json

- name: Run Performance Analysis
env:
GITHUB_TOKEN: ${{ secrets.GH_PAT_FOR_NIGHTLY_CI_DATA }}
Expand All @@ -61,6 +70,7 @@ jobs:
name: ci-analysis-results-${{ github.run_number }}
path: |
scripts/ci_monitor/ci_analysis_*.json
scripts/ci_monitor/nightly_analysis_*.json
scripts/ci_monitor/performance_tables_*
retention-days: 30

Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,16 @@ jobs:

- name: Check proto files are in sync
run: |
if ! diff -q python/sglang/srt/grpc/sglang_scheduler.proto sgl-router/src/proto/sglang_scheduler.proto; then
if ! diff -q python/sglang/srt/grpc/sglang_scheduler.proto sgl-model-gateway/src/proto/sglang_scheduler.proto; then
echo "❌ ERROR: Proto files are out of sync!"
echo ""
echo "The following files must be kept identical:"
echo " - python/sglang/srt/grpc/sglang_scheduler.proto"
echo " - sgl-router/src/proto/sglang_scheduler.proto"
echo " - sgl-model-gateway/src/proto/sglang_scheduler.proto"
echo ""
echo "Please ensure both files have the same content."
echo ""
echo "Differences:"
diff python/sglang/srt/grpc/sglang_scheduler.proto sgl-router/src/proto/sglang_scheduler.proto || true
diff python/sglang/srt/grpc/sglang_scheduler.proto sgl-model-gateway/src/proto/sglang_scheduler.proto || true
exit 1
fi
8 changes: 4 additions & 4 deletions .github/workflows/nightly-release-gateway.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,9 +50,9 @@ jobs:
with:
path: sglang-repo

- name: Move sgl-router folder to root and delete sglang-repo
- name: Move sgl-model-gateway folder to root and delete sglang-repo
run: |
mv sglang-repo/sgl-router/* .
mv sglang-repo/sgl-model-gateway/* .
rm -rf sglang-repo
ls -alt
shell: bash
Expand Down Expand Up @@ -138,9 +138,9 @@ jobs:
with:
path: sglang-repo

- name: Move sgl-router folder to root and delete sglang-repo
- name: Move sgl-model-gateway folder to root and delete sglang-repo
run: |
mv sglang-repo/sgl-router/* .
mv sglang-repo/sgl-model-gateway/* .
rm -rf sglang-repo
ls -alt

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/nightly-test-amd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
if: github.repository == 'sgl-project/sglang' || github.event_name == 'pull_request'
strategy:
matrix:
runner: [linux-mi300-gpu-2, linux-mi325-gpu-2-nightly]
runner: [linux-mi325-gpu-2]
runs-on: ${{matrix.runner}}
steps:
- name: Checkout code
Expand Down
Loading
Loading