[sgl] improve accuracy of additional page requirement during spec decode#22406
[sgl] improve accuracy of additional page requirement during spec decode#22406hnyls2002 merged 3 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
@2022tgoel could you fix the lint first? |
Qiaolin-Yu
left a comment
There was a problem hiding this comment.
I feel like it's correct. But also cc @hnyls2002 for another check in case I might be missing some context.
|
/tag-and-rerun-ci |
|
/rerun-test test_eagle_infer_a.py test_eagle_infer_b.py test_eagle_infer_beta.py test_eagle3_basic.py test_specv2_kvcache_offloading.py test_swa_radix_cache_kl.py test_swa_unittest.py test_eagle_dp_attention.py |
|
✅ ✅ ✅ ✅ ✅ ✅ ✅ ✅ |
Motivation
The
new_tokens_required_next_decodecalculation was very conservative in calculating how many pages a batch will consume. I would like to replace it with a more realistic estimate that mimics the logic ineagle_info_v2.pyModifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci