[Spec][Ngram] Add output-as-corpus accept length benchmark for external SAM by hnyls2002 · Pull Request #22199 · sgl-project/sglang

hnyls2002 · 2026-04-06T18:29:51Z

Motivation

Part of Ngram refactoring series #21052
Following #22203

No end-to-end test proving that external SAM actually improves speculative decoding accept length.

Modifications

Add test_output_as_corpus_boosts_accept_length to TestNgramSpeculativeDecodingFlashinfer:

Generate outputs with temperature=0 (baseline, trie only), record avg_spec_accept_length
POST /add_external_corpus with generated outputs as corpus
Regenerate same prompts, assert SAM accept length >= 2x baseline

Single server launch — reuses existing flashinfer test class with --speculative-ngram-external-sam-budget 8. Also removes redundant TestNgramExternalSamSmoke (SAM functionality now covered by this benchmark + unit tests).

gemini-code-assist

Code Review

This pull request adds the TestNgramExternalSamAcceptLength test class to verify that N-gram speculative decoding with an external corpus significantly increases acceptance length. Feedback includes improving the robustness of server statistics retrieval for multi-GPU environments and a minor refactor of the baseline output generation loop.

gemini-code-assist · 2026-04-06T18:36:15Z

test/registered/spec/test_ngram_speculative_decoding.py

+    def _get_accept_length(self, base_url):
+        server_info = requests.get(base_url + "/get_server_info").json()
+        return server_info["internal_states"][0]["avg_spec_accept_length"]


The _get_accept_length method assumes that internal_states is non-empty and that the first entry contains the avg_spec_accept_length key. While this is likely true for a single-GPU setup, it could fail with a KeyError or IndexError in multi-GPU (DP/PP) environments where stats might be distributed or not yet reported. Consider making this more robust by checking for the key's existence or averaging across all ranks.

Suggested change

def _get_accept_length(self, base_url):

server_info = requests.get(base_url + "/get_server_info").json()

return server_info["internal_states"][0]["avg_spec_accept_length"]

def _get_accept_length(self, base_url):

server_info = requests.get(base_url + "/get_server_info").json()

states = server_info.get("internal_states", [])

accept_lengths = [s.get("avg_spec_accept_length", 0.0) for s in states if "avg_spec_accept_length" in s]

return sum(accept_lengths) / len(accept_lengths) if accept_lengths else 0.0

gemini-code-assist · 2026-04-06T18:36:15Z

test/registered/spec/test_ngram_speculative_decoding.py

+            generated_outputs = []
+            for _ in range(self.num_rounds):
+                outputs = self._generate_batch(
+                    self.base_url, self.prompts, self.max_new_tokens
+                )
+                generated_outputs = outputs


In the baseline generation loop, generated_outputs is overwritten in each round. While temperature=0 should produce deterministic results, it is generally safer to either accumulate all outputs or explicitly use the last round's results to ensure the corpus is fully populated as intended.

Suggested change

generated_outputs = []

for _ in range(self.num_rounds):

outputs = self._generate_batch(

self.base_url, self.prompts, self.max_new_tokens

)

generated_outputs = outputs

generated_outputs = []

for _ in range(self.num_rounds):

generated_outputs = self._generate_batch(

self.base_url, self.prompts, self.max_new_tokens

)

hnyls2002 · 2026-04-07T01:57:40Z

/rerun-test registered/spec/test_ngram_speculative_decoding.py

github-actions · 2026-04-07T01:58:05Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

…al SAM (#22199)

…h benchmark for external SAM (sgl-project#22199) Upstream SHA: be0277f Cherry-picked from sgl-project/sglang Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions bot added the speculative-decoding label Apr 6, 2026

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

hnyls2002 force-pushed the lsyin/ngram-sam-accept-length-benchmark branch 2 times, most recently from 6f726fa to fa62815 Compare April 7, 2026 01:53

add accept length benchmark, remove redundant smoke test

921b8c2

hnyls2002 force-pushed the lsyin/ngram-sam-accept-length-benchmark branch from fa62815 to 921b8c2 Compare April 7, 2026 01:56

hnyls2002 merged commit be0277f into main Apr 7, 2026
62 of 68 checks passed

hnyls2002 deleted the lsyin/ngram-sam-accept-length-benchmark branch April 7, 2026 02:09

hnyls2002 mentioned this pull request Apr 7, 2026

[Roadmap] Further Ngram Speculative Decoding Support #21052

Open

13 tasks

Fridge003 pushed a commit that referenced this pull request Apr 7, 2026

[Spec][Ngram] Add output-as-corpus accept length benchmark for extern…

e205247

…al SAM (#22199)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec][Ngram] Add output-as-corpus accept length benchmark for external SAM#22199

[Spec][Ngram] Add output-as-corpus accept length benchmark for external SAM#22199
hnyls2002 merged 1 commit intomainfrom
lsyin/ngram-sam-accept-length-benchmark

hnyls2002 commented Apr 6, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

hnyls2002 commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hnyls2002 commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

hnyls2002 commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hnyls2002 commented Apr 6, 2026 •

edited

Loading