[Spec][Ngram] Add output-as-corpus accept length benchmark for external SAM#22199
[Spec][Ngram] Add output-as-corpus accept length benchmark for external SAM#22199
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds the TestNgramExternalSamAcceptLength test class to verify that N-gram speculative decoding with an external corpus significantly increases acceptance length. Feedback includes improving the robustness of server statistics retrieval for multi-GPU environments and a minor refactor of the baseline output generation loop.
| def _get_accept_length(self, base_url): | ||
| server_info = requests.get(base_url + "/get_server_info").json() | ||
| return server_info["internal_states"][0]["avg_spec_accept_length"] |
There was a problem hiding this comment.
The _get_accept_length method assumes that internal_states is non-empty and that the first entry contains the avg_spec_accept_length key. While this is likely true for a single-GPU setup, it could fail with a KeyError or IndexError in multi-GPU (DP/PP) environments where stats might be distributed or not yet reported. Consider making this more robust by checking for the key's existence or averaging across all ranks.
| def _get_accept_length(self, base_url): | |
| server_info = requests.get(base_url + "/get_server_info").json() | |
| return server_info["internal_states"][0]["avg_spec_accept_length"] | |
| def _get_accept_length(self, base_url): | |
| server_info = requests.get(base_url + "/get_server_info").json() | |
| states = server_info.get("internal_states", []) | |
| accept_lengths = [s.get("avg_spec_accept_length", 0.0) for s in states if "avg_spec_accept_length" in s] | |
| return sum(accept_lengths) / len(accept_lengths) if accept_lengths else 0.0 |
| generated_outputs = [] | ||
| for _ in range(self.num_rounds): | ||
| outputs = self._generate_batch( | ||
| self.base_url, self.prompts, self.max_new_tokens | ||
| ) | ||
| generated_outputs = outputs |
There was a problem hiding this comment.
In the baseline generation loop, generated_outputs is overwritten in each round. While temperature=0 should produce deterministic results, it is generally safer to either accumulate all outputs or explicitly use the last round's results to ensure the corpus is fully populated as intended.
| generated_outputs = [] | |
| for _ in range(self.num_rounds): | |
| outputs = self._generate_batch( | |
| self.base_url, self.prompts, self.max_new_tokens | |
| ) | |
| generated_outputs = outputs | |
| generated_outputs = [] | |
| for _ in range(self.num_rounds): | |
| generated_outputs = self._generate_batch( | |
| self.base_url, self.prompts, self.max_new_tokens | |
| ) |
6f726fa to
fa62815
Compare
fa62815 to
921b8c2
Compare
|
/rerun-test registered/spec/test_ngram_speculative_decoding.py |
|
✅ |
…h benchmark for external SAM (sgl-project#22199) Upstream SHA: be0277f Cherry-picked from sgl-project/sglang Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation
Part of Ngram refactoring series #21052
Following #22203
No end-to-end test proving that external SAM actually improves speculative decoding accept length.
Modifications
Add
test_output_as_corpus_boosts_accept_lengthtoTestNgramSpeculativeDecodingFlashinfer:avg_spec_accept_lengthPOST /add_external_corpuswith generated outputs as corpusSingle server launch — reuses existing flashinfer test class with
--speculative-ngram-external-sam-budget 8. Also removes redundantTestNgramExternalSamSmoke(SAM functionality now covered by this benchmark + unit tests).