[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton by kpham-sgl · Pull Request #21425 · sgl-project/sglang

kpham-sgl · 2026-03-25T21:46:11Z

Motivation

Part of Ngram refactoring series #21052
Following #21243
We want to support loading an external corpus and constructing a Suffix Automaton for suffix matching during drafting.

Modifications

Add optional external-corpus support to NGRAM speculative decoding with --speculative-ngram-external-corpus-path, --speculative-ngram-external-sam-budget, and speculative_ngram_external_corpus_max_tokens
Stream corpus chunks -> tokenize -> SAM.extend() (and pipeline them) to avoid Pybind materialize the big corpus on memory 2x
Use speculative_ngram_external_corpus_max_tokens as a proxy to avoid CPU OOM during SAM construction (SAM memory overhead is ~ 2 * external_corpus_tokens_count)
Use the SAM to generate external-corpus continuations in both Recency and Frequency modes, while reserving a fixed portion of the draft budget for SAM-backed candidates.
Merge trie and SAM candidates into the same speculative tree with low overhead

Accuracy Tests

Verify with these tests locally

pytest test/registered/unit/server_args/test_server_args.py TestNgramExternalSamArgs
pytest test/registered/spec/utils/test_ngram_corpus.py
pytest test/registered/spec/test_ngram_speculative_decoding.py

Benchmarking and Profiling

This change affects server start time. Here is a quick script to benchmark it against the external corpus size
https://gist.github.com/kpham-sgl/00c547a6fb7ab431ad2a09e42e371bd2

python benchmark_external_corpus_overhead.py 
Trials per config: 3
Corpus sizes (tokens): [1000000, 5000000, 10000000]
Tokenizer: meta-llama/Llama-3.1-8B-Instruct

Running baseline (no external corpus)...
Running benchmark: ~1,000,000 tokens...
Running benchmark: ~5,000,000 tokens...
Running benchmark: ~10,000,000 tokens...

================================================================================
External Corpus Load + SAM Construction Overhead
================================================================================
  corpus_tokens |  load_time_mean(s) |  load_time_std | mem_delta_mean(MB) |  mem_delta_std
-------------------------------------------------------------------------------------------
   0 (baseline) |             0.0828 |         0.0046 |             177.85 |           0.04
      1,000,000 |             2.6590 |         0.0418 |             398.58 |           0.35
      5,000,000 |            14.7094 |         0.1811 |            1845.80 |           0.19
     10,000,000 |            31.1640 |         0.1667 |            3846.36 |           0.40

We can look into optimizing this in later PRs

[TODO] Run any speculative decoding here with external corpus dataset to see how this help speedup Ngram speculative decoding

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-25T21:46:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Move external corpus ingestion into the suffix automaton constructor so the SAM lifecycle matches one-time startup loading and no longer exposes a separate mutable build step. Made-with: Cursor

gemini-code-assist · 2026-03-28T03:42:12Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Adapt SAM external corpus feature to the new ngram architecture: - Move suffix_automaton.{h,cpp} to jit_kernel/csrc/ngram_corpus/ - Use TVM FFI instead of pybind11 for SAM loading methods - Change match_state_ key from std::string to int64_t state_ids - Add SAM budget splitting to stateful batchMatch overload - Wire external corpus params through FFI constructor - Keep both stateless and stateful batchMatch overloads

hnyls2002 · 2026-04-06T06:45:48Z

/rerun-test test_ngram_corpus.py test_server_args.py test_ngram_speculative_decoding.py

hnyls2002 · 2026-04-06T06:46:04Z

/tag-and-rerun-ci

github-actions · 2026-04-06T06:46:29Z

✅ ubuntu-latest (2 tests): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py
cd test/ && python3 registered/unit/server_args/test_server_args.py

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

…omaton (sgl-project#21425)

…omaton (#21425)

kpham-sgl added 7 commits March 23, 2026 19:59

remove min max match window

4926b37

lint

d30835f

misc

36ed24c

increment anchor after every decode steps instead of rematching

0d1d60f

lint

1951002

nit

0e2d349

sam initial commit

7ad0377

kpham-sgl requested review from Ying1123, hnyls2002 and merrymercy as code owners March 25, 2026 21:46

kpham-sgl changed the title ~~[WIP[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton~~ [WIP][Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton Mar 25, 2026

github-actions bot added documentation Improvements or additions to documentation lora speculative-decoding labels Mar 25, 2026

kpham-sgl marked this pull request as draft March 25, 2026 21:46

kpham-sgl added 6 commits March 26, 2026 17:56

rename and lint

eee16c2

construct sam during external corpus load

a9c0253

Move external corpus ingestion into the suffix automaton constructor so the SAM lifecycle matches one-time startup loading and no longer exposes a separate mutable build step. Made-with: Cursor

e2e test for the SAM

3b22415

better external corpus load mechanism

7f34221

misc fix

d499e2b

add chunking support

a70af7b

kpham-sgl mentioned this pull request Mar 28, 2026

[Roadmap] Further Ngram Speculative Decoding Support #21052

Open

13 tasks

kpham-sgl added 6 commits March 28, 2026 01:48

change spec tree draft merging algorithm

ff32174

minor server args update

ec3f831

remove cpp token limit backstop

e65bd24

minor change to tree merging algo

b948892

misc

93faac5

update some comments

7a61d97

kpham-sgl force-pushed the kp/SAM-for-external-corpus branch from 2d5cfef to 7a61d97 Compare March 28, 2026 03:39

kpham-sgl changed the title ~~[WIP][Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton~~ [Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton Mar 28, 2026

kpham-sgl marked this pull request as ready for review March 28, 2026 03:42

hnyls2002 requested review from BBuf, DarkSharpness, HydraQYH, celve and yuan-luo as code owners April 6, 2026 06:37

github-actions bot added the jit-kernel label Apr 6, 2026

remove duplicate suffix_automaton files from old cpp_ngram location

bc8137e

hnyls2002 added the high priority label Apr 6, 2026

github-actions bot added the run-ci label Apr 6, 2026

hnyls2002 merged commit 12272b6 into sgl-project:main Apr 6, 2026
97 of 162 checks passed

This was referenced Apr 6, 2026

[Spec][Ngram] Add output-as-corpus accept length benchmark for external SAM #22199

Merged

[Spec][Ngram] Support multiple SAMs with dynamic HTTP API #22203

Merged

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Aut…

7f31fc3

…omaton (sgl-project#21425)

Fridge003 pushed a commit that referenced this pull request Apr 7, 2026

[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Aut…

3e111d7

…omaton (#21425)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton#21425

[Spec][Ngram] 6/N: Load an external corpus and construct a Suffix Automaton#21425
hnyls2002 merged 21 commits intosgl-project:mainfrom
kpham-sgl:kp/SAM-for-external-corpus

kpham-sgl commented Mar 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot commented Mar 28, 2026

Uh oh!

hnyls2002 commented Apr 6, 2026

Uh oh!

hnyls2002 commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kpham-sgl commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot commented Mar 28, 2026

Uh oh!

hnyls2002 commented Apr 6, 2026

Uh oh!

hnyls2002 commented Apr 6, 2026

Uh oh!

github-actions bot commented Apr 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kpham-sgl commented Mar 25, 2026 •

edited

Loading