Skip to content

[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton#21425

Merged
hnyls2002 merged 21 commits intosgl-project:mainfrom
kpham-sgl:kp/SAM-for-external-corpus
Apr 6, 2026
Merged

[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton#21425
hnyls2002 merged 21 commits intosgl-project:mainfrom
kpham-sgl:kp/SAM-for-external-corpus

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl commented Mar 25, 2026

Motivation

Part of Ngram refactoring series #21052
Following #21243
We want to support loading an external corpus and constructing a Suffix Automaton for suffix matching during drafting.

Modifications

  • Add optional external-corpus support to NGRAM speculative decoding with --speculative-ngram-external-corpus-path, --speculative-ngram-external-sam-budget, and speculative_ngram_external_corpus_max_tokens
  • Stream corpus chunks -> tokenize -> SAM.extend() (and pipeline them) to avoid Pybind materialize the big corpus on memory 2x
  • Use speculative_ngram_external_corpus_max_tokens as a proxy to avoid CPU OOM during SAM construction (SAM memory overhead is ~ 2 * external_corpus_tokens_count)
  • Use the SAM to generate external-corpus continuations in both Recency and Frequency modes, while reserving a fixed portion of the draft budget for SAM-backed candidates.
  • Merge trie and SAM candidates into the same speculative tree with low overhead

Accuracy Tests

Verify with these tests locally

pytest test/registered/unit/server_args/test_server_args.py TestNgramExternalSamArgs
pytest test/registered/spec/utils/test_ngram_corpus.py
pytest test/registered/spec/test_ngram_speculative_decoding.py

Benchmarking and Profiling

This change affects server start time. Here is a quick script to benchmark it against the external corpus size
https://gist.github.com/kpham-sgl/00c547a6fb7ab431ad2a09e42e371bd2

python benchmark_external_corpus_overhead.py 
Trials per config: 3
Corpus sizes (tokens): [1000000, 5000000, 10000000]
Tokenizer: meta-llama/Llama-3.1-8B-Instruct

Running baseline (no external corpus)...
Running benchmark: ~1,000,000 tokens...
Running benchmark: ~5,000,000 tokens...
Running benchmark: ~10,000,000 tokens...

================================================================================
External Corpus Load + SAM Construction Overhead
================================================================================
  corpus_tokens |  load_time_mean(s) |  load_time_std | mem_delta_mean(MB) |  mem_delta_std
-------------------------------------------------------------------------------------------
   0 (baseline) |             0.0828 |         0.0046 |             177.85 |           0.04
      1,000,000 |             2.6590 |         0.0418 |             398.58 |           0.35
      5,000,000 |            14.7094 |         0.1811 |            1845.80 |           0.19
     10,000,000 |            31.1640 |         0.1667 |            3846.36 |           0.40

We can look into optimizing this in later PRs

[TODO] Run any speculative decoding here with external corpus dataset to see how this help speedup Ngram speculative decoding

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl kpham-sgl changed the title [WIP[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton [WIP][Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton Mar 25, 2026
@github-actions github-actions bot added documentation Improvements or additions to documentation lora speculative-decoding labels Mar 25, 2026
@kpham-sgl kpham-sgl marked this pull request as draft March 25, 2026 21:46
Move external corpus ingestion into the suffix automaton constructor so the SAM lifecycle matches one-time startup loading and no longer exposes a separate mutable build step.

Made-with: Cursor
@kpham-sgl kpham-sgl force-pushed the kp/SAM-for-external-corpus branch from 2d5cfef to 7a61d97 Compare March 28, 2026 03:39
@kpham-sgl kpham-sgl changed the title [WIP][Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton [Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton Mar 28, 2026
@kpham-sgl kpham-sgl marked this pull request as ready for review March 28, 2026 03:42
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Adapt SAM external corpus feature to the new ngram architecture:
- Move suffix_automaton.{h,cpp} to jit_kernel/csrc/ngram_corpus/
- Use TVM FFI instead of pybind11 for SAM loading methods
- Change match_state_ key from std::string to int64_t state_ids
- Add SAM budget splitting to stateful batchMatch overload
- Wire external corpus params through FFI constructor
- Keep both stateless and stateful batchMatch overloads
@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-test test_ngram_corpus.py test_server_args.py test_ngram_speculative_decoding.py

@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

ubuntu-latest (2 tests): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py
cd test/ && python3 registered/unit/server_args/test_server_args.py

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@hnyls2002 hnyls2002 merged commit 12272b6 into sgl-project:main Apr 6, 2026
97 of 162 checks passed
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants