Skip to content

[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps#21243

Merged
hnyls2002 merged 8 commits intosgl-project:mainfrom
kpham-sgl:kp/maintain-per-anchor-matching-state-across-decode-steps
Apr 6, 2026
Merged

[Spec][Ngram] 5/N: Store and advance anchor match state across decode steps#21243
hnyls2002 merged 8 commits intosgl-project:mainfrom
kpham-sgl:kp/maintain-per-anchor-matching-state-across-decode-steps

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl commented Mar 24, 2026

Motivation

Part of Ngram refactoring series #21052
Following #21225

Previously, match() is O(D^2) at every decode steps where D is the max_trie_depth. One observation is, since we always append to the sequence during decode, we can store a list of MatchState (which corresponds to an anchor) from previous decode step and advance them in O(1) for each anchor

Modifications

  • Keep per-request NGRAM anchor state across decode steps instead of rebuilding every suffix match from trie root each time.
  • Add MatchState plus versioned NodeRef so cached anchors can be advanced safely across decode steps and invalidated correctly after eviction or reset.
  • Make Trie::match() stateful: infer the appended suffix from the current tail and total_len, advance cached anchors when valid, and rebuild when state is stale.
  • Preserve existing BFS / PROB draft construction behavior while swapping anchor collection to the new stateful matcher.
  • Move per-request match-state ownership into Ngram, keyed by req_id, with explicit cleanup on request finish/reset.
  • Simplify the public matching API to batchMatch(req_ids, tokens, total_lens) / batch_get(req_ids, batch_tokens, total_lens) by removing explicit appended-token plumbing.
  • Simplify NGRAMWorker integration so it only passes the trimmed tail and full request length, without mutating Req or keeping extra Python-side match state.
  • Add regression coverage for incremental-vs-stateless equivalence, leaf-anchor expansion, and stale-state rebuild after eviction.

Accuracy Tests

Passed python3 -m pytest -q test/registered/spec/utils/test_ngram_corpus.py and python3 -m pytest -q test/registered/spec/test_ngram_speculative_decoding.py

Benchmarking and Profiling

[TODO]

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…-decode-steps

Resolve merge conflicts adapting the stateful ngram MatchState feature
to the upstream's TVM FFI binding (replacing pybind11):

- trie.h/trie.cpp: keep HEAD's MatchState, NodeRef, and incremental
  anchor advancement logic (auto-resolved)
- ngram.h/ngram.cpp: add stateless batchMatch(tokens) overload for FFI,
  switch stateful overload to int64_t keys (FFI-compatible)
- ngram_corpus_ffi.cpp: add batch_match_stateful and erase_match_state
  FFI methods
- jit_kernel/ngram_corpus.py: add match_stateful and erase_states to
  the FFI wrapper
- srt/speculative/cpp_ngram/ngram_corpus.py: use get_ngram_corpus_cls()
  with a req_id-to-state_id mapping layer for the stateful path
- Remove old pybind11 ngram_corpus_binding.cpp (superseded by FFI)
- test_ngram_corpus.py: resolve to use _batch_get helper, remove stale
  _raw_batch_match

Made-with: Cursor
@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-test test_ngram_corpus test_ngram_speculative_decoding

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

test_ngram_corpus: No test file found matching test_ngram_corpus under test/registered/.

test_ngram_speculative_decoding: No test file found matching test_ngram_speculative_decoding under test/registered/.

@hnyls2002
Copy link
Copy Markdown
Collaborator

/rerun-test test/registered/spec/utils/test_ngram_corpus.py test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

1-gpu-5090: View workflow run

cd test/ && python3 registered/spec/utils/test_ngram_corpus.py

1-gpu-h100: View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@hnyls2002
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@github-actions github-actions bot added the run-ci label Apr 6, 2026
@hnyls2002 hnyls2002 merged commit b2008bf into sgl-project:main Apr 6, 2026
101 of 156 checks passed
hnyls2002 added a commit to kpham-sgl/sglang that referenced this pull request Apr 6, 2026
Adapt SAM external corpus feature to the new ngram architecture:
- Move suffix_automaton.{h,cpp} to jit_kernel/csrc/ngram_corpus/
- Use TVM FFI instead of pybind11 for SAM loading methods
- Change match_state_ key from std::string to int64_t state_ids
- Add SAM budget splitting to stateful batchMatch overload
- Wire external corpus params through FFI constructor
- Keep both stateless and stateful batchMatch overloads
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants