[Spec][Ngram] Misc enhance support for multiple SAMs#22294
[Spec][Ngram] Misc enhance support for multiple SAMs#22294hnyls2002 merged 11 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/tag-and-rerun-ci |
|
/rerun-test test/registered/unit/spec/test_ngram_corpus.py |
|
/rerun-test test/registered/spec/test_ngram_speculative_decoding.py |
|
✅ |
|
✅ |
hnyls2002
left a comment
There was a problem hiding this comment.
-
NgramCorpus.load_external_corpus_namedhas partial replace handling (_corpus_token_counts.pop(corpus_id, 0)) and a dedicated test (test_replace_corpus_respects_budget), but replace is not a defined API semantic — HTTP layer auto-generates uuid, and C++ silently overwrites viastd::move. Either reject duplicatecorpus_idwith an error, or explicitly support replace end-to-end. -
NGRAMWorker.remaining_corpus_token_budgetis dead code — no caller in the codebase. Remove it or wire it into thelist_external_corporaresponse.
@hnyls2002 Replace is actually a defined API semantic. HTTP layer only auto-generates sglang/python/sglang/srt/managers/tokenizer_communicator_mixin.py Lines 378 to 381 in c89afae However one good point here is that replacing a corpus is a non-atomic action and correctly interacting with In favor of the upcoming PR about SAM eviction I will remove all replace path. If user want to replace a corpus by |
…xternal_corpus_named and remove_external_corpus
…/sglang into kp/multi-sam-http-api-misc-fix
|
@hnyls2002 some changes since last review
|
|
/rerun-test test/registered/unit/spec/test_ngram_corpus.py |
|
/rerun-test test/registered/spec/test_ngram_speculative_decoding.py |
|
✅ |
|
✅ |
Motivation
Part of Ngram refactoring series #21052
Following #22203
Miscellaneous behavioral fixes and improvements for multi-SAM support in #22203
Modifications
Ngram::resetStagingSam()C++ method and exposed via FFI ascancel_external_corpus_load(). On load failure, the Python wrapper now calls this instead ofclear_external_corpus(), so previously loaded corpora are preserved when a new load fails.add_external_corpus,remove_external_corpus, andlist_external_corporaHTTP handlers now return an error early ifspeculative_algorithm != "NGRAM", instead of crashing._Communicator.merge_results(results)to aggregate success/message across all DP ranks, instead of returning onlyresults[0].NgramCorpusnow tracks per-corpus token counts and a running total, enforcingexternal_corpus_max_tokensas a global budget. Exceeding the limit rolls back the just-loaded corpus and raisesValueError. Removing a corpus frees its tokens from the budget.remaining_token_budgetproperty on bothNgramCorpusandNGRAMWorker.Ngram::reset()clarifying it preserves external corpora (sams_).test_remove_frees_token_budget,test_replace_corpus_respects_budget,test_error_on_load_preserves_existing_corporato make sure corpus are not loaded beyond theexternal_corpus_max_tokensthreshold.Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci