Skip to content

[Spec][Ngram] Misc enhance support for multiple SAMs#22294

Merged
hnyls2002 merged 11 commits intosgl-project:mainfrom
kpham-sgl:kp/multi-sam-http-api-misc-fix
Apr 9, 2026
Merged

[Spec][Ngram] Misc enhance support for multiple SAMs#22294
hnyls2002 merged 11 commits intosgl-project:mainfrom
kpham-sgl:kp/multi-sam-http-api-misc-fix

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

Motivation

Part of Ngram refactoring series #21052
Following #22203

Miscellaneous behavioral fixes and improvements for multi-SAM support in #22203

Modifications

  • Added Ngram::resetStagingSam() C++ method and exposed via FFI as cancel_external_corpus_load(). On load failure, the Python wrapper now calls this instead of clear_external_corpus(), so previously loaded corpora are preserved when a new load fails.
  • add_external_corpus, remove_external_corpus, and list_external_corpora HTTP handlers now return an error early if speculative_algorithm != "NGRAM", instead of crashing.
  • All three HTTP handlers now call _Communicator.merge_results(results) to aggregate success/message across all DP ranks, instead of returning only results[0].
  • NgramCorpus now tracks per-corpus token counts and a running total, enforcing external_corpus_max_tokens as a global budget. Exceeding the limit rolls back the just-loaded corpus and raises ValueError. Removing a corpus frees its tokens from the budget.
  • Exposed remaining_token_budget property on both NgramCorpus and NGRAMWorker.
  • Added docstring to Ngram::reset() clarifying it preserves external corpora (sams_).
  • Added tests: test_remove_frees_token_budget, test_replace_corpus_respects_budget, test_error_on_load_preserves_existing_corpora to make sure corpus are not loaded beyond the external_corpus_max_tokens threshold.

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Apr 7, 2026
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. NgramCorpus.load_external_corpus_named has partial replace handling (_corpus_token_counts.pop(corpus_id, 0)) and a dedicated test (test_replace_corpus_respects_budget), but replace is not a defined API semantic — HTTP layer auto-generates uuid, and C++ silently overwrites via std::move. Either reject duplicate corpus_id with an error, or explicitly support replace end-to-end.

  2. NGRAMWorker.remaining_corpus_token_budget is dead code — no caller in the codebase. Remove it or wire it into the list_external_corpora response.

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

kpham-sgl commented Apr 8, 2026

NgramCorpus.load_external_corpus_named has partial replace handling (_corpus_token_counts.pop(corpus_id, 0)) and a dedicated test (test_replace_corpus_respects_budget), but replace is not a defined API semantic — HTTP layer auto-generates uuid, and C++ silently overwrites via std::move. Either reject duplicate corpus_id with an error, or explicitly support replace end-to-end.

@hnyls2002 Replace is actually a defined API semantic. HTTP layer only auto-generates corpus_id if it is not provided in the request body.

if not obj.corpus_id:
import uuid
obj.corpus_id = uuid.uuid4().hex

However one good point here is that replacing a corpus is a non-atomic action and correctly interacting with external_corpus_max_tokens is tricky (we currently have a bug here, a pre-existing corpus will be incorrectly deleted if its in-flight replacement brings the total token count > external_corpus_max_tokens). Hence, we probably want /add_external_corpus to be create-only.

In favor of the upcoming PR about SAM eviction I will remove all replace path. If user want to replace a corpus by corpus-id they can do so explicitly with/remove_external_corpus followed by /add_external_corpus

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

@hnyls2002 some changes since last review

  • Remove replace part entirely. User need to call remove before add to replace a corpus
  • Remove NGRAMWorker.remaining_corpus_token_budget
  • commit_corpus_load after load background thread finishes to update some bookkeeping info (_corpus_token_counts, _total_loaded_tokens). Server startup load also now explicitly call commit_corpus_load
  • Modify unit tests for ^ behaviour
  • Add comments and one FIXME to prevent removing corpus during pending load. Planning to address this in the next SAM eviction PR

@kpham-sgl kpham-sgl requested a review from hnyls2002 April 8, 2026 23:11
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@hnyls2002 hnyls2002 merged commit f127d67 into sgl-project:main Apr 9, 2026
82 of 110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants