Skip to content

[sgl] improve accuracy of additional page requirement during spec decode#22406

Merged
hnyls2002 merged 3 commits intosgl-project:mainfrom
2022tgoel:tarushii/mtp-6
Apr 16, 2026
Merged

[sgl] improve accuracy of additional page requirement during spec decode#22406
hnyls2002 merged 3 commits intosgl-project:mainfrom
2022tgoel:tarushii/mtp-6

Conversation

@2022tgoel
Copy link
Copy Markdown
Contributor

Motivation

The new_tokens_required_next_decode calculation was very conservative in calculating how many pages a batch will consume. I would like to replace it with a more realistic estimate that mimics the logic in eagle_info_v2.py

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ispobock
Copy link
Copy Markdown
Collaborator

@2022tgoel could you fix the lint first?

Copy link
Copy Markdown
Collaborator

@Qiaolin-Yu Qiaolin-Yu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's correct. But also cc @hnyls2002 for another check in case I might be missing some context.

@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-test test_eagle_infer_a.py test_eagle_infer_b.py test_eagle_infer_beta.py test_eagle3_basic.py test_specv2_kvcache_offloading.py test_swa_radix_cache_kl.py test_swa_unittest.py test_eagle_dp_attention.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100: View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py

1-gpu-h100: View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_b.py

1-gpu-5090: View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py

1-gpu-5090: View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle3_basic.py

1-gpu-5090: View workflow run

cd test/ && python3 registered/disaggregation/test_specv2_kvcache_offloading.py

1-gpu-h100: View workflow run

cd test/ && python3 registered/radix_cache/test_swa_radix_cache_kl.py

1-gpu-h100: View workflow run

cd test/ && python3 registered/unit/mem_cache/test_swa_unittest.py

4-gpu-h100: View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_dp_attention.py

@hnyls2002 hnyls2002 merged commit 2211b4d into sgl-project:main Apr 16, 2026
82 of 131 checks passed
jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
@hnyls2002 hnyls2002 mentioned this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants