Fix decode OOM caused by retraction by hnyls2002 · Pull Request #14939 · sgl-project/sglang

hnyls2002 · 2025-12-12T03:35:22Z

In the retract tokens calculation, we should also consider the multiplier.

gemini-code-assist · 2025-12-12T03:35:25Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

hnyls2002 · 2025-12-13T04:08:03Z

/tag-and-rerun-ci

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (25 commits) [NPU] perf update with kvcache nz & w4a8 quant (sgl-project#14423) [PP Prefill][NIXL] Fix PP mode transfer completion tracking to wait for all ranks (sgl-project#15027) Fix GLM-4.6 tool calls don't support streaming output for arguments i… (sgl-project#13989) feature: adding nightly wheel workflow and indexer (sgl-project#14924) [diffusion] feat: Improve LoRA compatibility by adding unified format detection and diffusers-based normalization (sgl-project#14659) [Fix] Disable trtllm moe backend for draft model for a qucik fix (sgl-project#15002) [diffusion] fix: use NDRotaryEmbedding in flux_2 (sgl-project#15034) Mistral Large 3 NVFP4 support (sgl-project#14485) call check_quantized_moe_compatibility after initialize (sgl-project#13876) Add sgl_router_attempt_http_responses_total for single attempt information (sgl-project#15037) Add error code in prometheus metrics and add X-SMG-Error-Code header (sgl-project#15036) Provide more fine grained error reason for reqwest error (sgl-project#15032) Tiny change http router response format to unify (sgl-project#15031) Tiny unify grpc existing error responses into new format (sgl-project#15030) Add `code` field and unify error responses for router (sgl-project#15028) Super tiny remove unused log_request (sgl-project#15035) Fix decode OOM caused by retraction (sgl-project#14939) [CI]Add gb200 runner back (sgl-project#15024) Add a special label for b200 CI runner that can run kernel tests (sgl-project#15033) Fix regression caused by fa3 block_table (sgl-project#15009) ... # Conflicts: # python/sglang/srt/hardware_backend/npu/attention/ascend_backend.py

hnyls2002 requested review from Ying1123, merrymercy, xiezhq-hermann and zhyncs as code owners December 12, 2025 03:35

hnyls2002 force-pushed the lsyin/fix-spec-retract branch from 6588d2b to eaf76d3 Compare December 12, 2025 03:40

hnyls2002 added 2 commits December 12, 2025 12:42

add retraction test

3bdd050

apply multipler in retract decode

a80caef

hnyls2002 force-pushed the lsyin/fix-spec-retract branch from eaf76d3 to a80caef Compare December 12, 2025 03:42

hnyls2002 added 2 commits December 12, 2025 13:59

fix

34e0af1

Merge branch 'main' into lsyin/fix-spec-retract

d907d78

hnyls2002 requested review from Fridge003 and ispobock as code owners December 13, 2025 03:12

github-actions bot added the run-ci label Dec 13, 2025

hnyls2002 merged commit 01e3b3f into main Dec 13, 2025
104 of 116 checks passed

hnyls2002 deleted the lsyin/fix-spec-retract branch December 13, 2025 04:59

This was referenced Dec 13, 2025

[Bug] Decode OOM with spec + retract #14942

Closed

Fix IMA with flashinfer + spec + topk & Add radix attention test cases for eagle #13740

Merged

This was referenced Dec 13, 2025

[Bugfix] fix decode oom with EAGLE #14645

Closed

[Bug] Decode OOM with spec #13741

Closed

Prozac614 pushed a commit to Prozac614/sglang that referenced this pull request Dec 17, 2025

Fix decode OOM caused by retraction (sgl-project#14939)

e94192a

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

Fix decode OOM caused by retraction (sgl-project#14939)

c4dd723

TZHelloWorld pushed a commit to TZHelloWorld/sglang that referenced this pull request Mar 4, 2026

cherry-pick: Fix decode OOM caused by retraction (sgl-project#14939)

2562069

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix decode OOM caused by retraction#14939

Fix decode OOM caused by retraction#14939
hnyls2002 merged 4 commits intomainfrom
lsyin/fix-spec-retract

hnyls2002 commented Dec 12, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Uh oh!

hnyls2002 commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnyls2002 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Uh oh!

hnyls2002 commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnyls2002 commented Dec 12, 2025 •

edited

Loading