Unify think_end_id to model_config as single source of truth#22148
Unify think_end_id to model_config as single source of truth#22148
Conversation
Move reasoning token tracking into V1 verify phase (eagle_info, ngram_info) so it stays alongside output_ids/check_finished/grammar. Isolate V1's post-processing block with early continue, making the V1-specific code easy to locate and delete when deprecating V1.
Previously think_end_id was stored in three places: dynamically patched onto the tokenizer, as self._think_end_id on the Scheduler, and on model_config. Consolidate to model_config.think_end_id as the single source of truth. Remove the tokenizer patch and scheduler private field. Pass think_end_id explicitly to create_grammar_backend instead of reading it from the tokenizer.
There was a problem hiding this comment.
Code Review
This pull request refactors the handling of the think_end_id by moving it from the tokenizer and scheduler instance variables into the model_config. This change streamlines how reasoning tokens are identified across the grammar backend and scheduler output processor. Feedback was provided regarding a potential IndexError in scheduler.py when encoding the think_end_token, suggesting a safety check for cases where the tokenizer might return an empty list.
| self.model_config.think_end_id = self.tokenizer.encode( | ||
| reasoning_parser.detector.think_end_token, add_special_tokens=False | ||
| )[0] |
There was a problem hiding this comment.
The current implementation assumes that self.tokenizer.encode will always return at least one token ID. If the tokenizer fails to encode the think_end_token (e.g., due to an empty string or a tokenizer-specific quirk), this will raise an IndexError. Additionally, if the token is split into multiple IDs, only the first one is captured, which might lead to incorrect reasoning detection later. Consider adding a safety check.
| self.model_config.think_end_id = self.tokenizer.encode( | |
| reasoning_parser.detector.think_end_token, add_special_tokens=False | |
| )[0] | |
| ids = self.tokenizer.encode( | |
| reasoning_parser.detector.think_end_token, add_special_tokens=False | |
| ) | |
| self.model_config.think_end_id = ids[0] if ids else None |
|
/tag-and-rerun-ci |
|
/rerun-test test_reasoning_tokens.py |
|
❌ |
|
/rerun-test test_eagle_infer_a.py |
|
/rerun-test test_eagle_constrained_decoding.py |
|
/rerun-test test_ngram_speculative_decoding.py |
|
✅ |
|
✅ |
|
✅ |
|
/rerun-test test_reasoning.py |
|
✅ |
|
/rerun-test test_reasoning.py test_eagle_infer_a.py test_eagle_constrained_decoding.py test_ngram_speculative_decoding.py |
|
✅ |
Stacked on #22146.
think_end_idwas stored in three redundant locations:self.tokenizer.think_end_id(dynamically patched, accessed viahasattrguard)self._think_end_id(Scheduler private field)self.model_config.think_end_id(added in Isolate spec V1 path in decode post-processing #22146)This consolidates to
model_config.think_end_idas the single canonical source. The tokenizer patch and scheduler private field are removed.create_grammar_backendnow takesthink_end_idas an explicit parameter instead of reading it from the tokenizer.Changes (4 files, net -3 lines):
scheduler.py: compute once, store only onmodel_configscheduler_output_processor_mixin.py: read fromself.model_configbase_grammar_backend.py: acceptthink_end_idparam, drophasattrguardgrammar_manager.py: passmodel_config.think_end_idto grammar backend