Unify think_end_id to model_config as single source of truth by hnyls2002 · Pull Request #22148 · sgl-project/sglang

hnyls2002 · 2026-04-05T09:05:44Z

Stacked on #22146.

think_end_id was stored in three redundant locations:

self.tokenizer.think_end_id (dynamically patched, accessed via hasattr guard)
self._think_end_id (Scheduler private field)
self.model_config.think_end_id (added in Isolate spec V1 path in decode post-processing #22146)

This consolidates to model_config.think_end_id as the single canonical source. The tokenizer patch and scheduler private field are removed. create_grammar_backend now takes think_end_id as an explicit parameter instead of reading it from the tokenizer.

Changes (4 files, net -3 lines):

scheduler.py: compute once, store only on model_config
scheduler_output_processor_mixin.py: read from self.model_config
base_grammar_backend.py: accept think_end_id param, drop hasattr guard
grammar_manager.py: pass model_config.think_end_id to grammar backend

Move reasoning token tracking into V1 verify phase (eagle_info, ngram_info) so it stays alongside output_ids/check_finished/grammar. Isolate V1's post-processing block with early continue, making the V1-specific code easy to locate and delete when deprecating V1.

Previously think_end_id was stored in three places: dynamically patched onto the tokenizer, as self._think_end_id on the Scheduler, and on model_config. Consolidate to model_config.think_end_id as the single source of truth. Remove the tokenizer patch and scheduler private field. Pass think_end_id explicitly to create_grammar_backend instead of reading it from the tokenizer.

gemini-code-assist

Code Review

This pull request refactors the handling of the think_end_id by moving it from the tokenizer and scheduler instance variables into the model_config. This change streamlines how reasoning tokens are identified across the grammar backend and scheduler output processor. Feedback was provided regarding a potential IndexError in scheduler.py when encoding the think_end_token, suggesting a safety check for cases where the tokenizer might return an empty list.

gemini-code-assist · 2026-04-05T09:07:23Z

+            self.model_config.think_end_id = self.tokenizer.encode(
                reasoning_parser.detector.think_end_token, add_special_tokens=False
            )[0]


The current implementation assumes that self.tokenizer.encode will always return at least one token ID. If the tokenizer fails to encode the think_end_token (e.g., due to an empty string or a tokenizer-specific quirk), this will raise an IndexError. Additionally, if the token is split into multiple IDs, only the first one is captured, which might lead to incorrect reasoning detection later. Consider adding a safety check.

Suggested change

self.model_config.think_end_id = self.tokenizer.encode(

reasoning_parser.detector.think_end_token, add_special_tokens=False

)[0]

ids = self.tokenizer.encode(

reasoning_parser.detector.think_end_token, add_special_tokens=False

)

self.model_config.think_end_id = ids[0] if ids else None

hnyls2002 · 2026-04-05T10:19:32Z

/tag-and-rerun-ci

hnyls2002 · 2026-04-05T10:19:59Z

/rerun-test test_reasoning_tokens.py

github-actions · 2026-04-05T10:20:19Z

❌ test_reasoning_tokens.py: No test file found matching test_reasoning_tokens.py under test/registered/.

hnyls2002 · 2026-04-05T10:20:20Z

/rerun-test test_eagle_infer_a.py

hnyls2002 · 2026-04-05T10:20:21Z

/rerun-test test_eagle_constrained_decoding.py

hnyls2002 · 2026-04-05T10:20:22Z

/rerun-test test_ngram_speculative_decoding.py

github-actions · 2026-04-05T10:20:46Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py

github-actions · 2026-04-05T10:20:49Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/eagle/test_eagle_constrained_decoding.py

github-actions · 2026-04-05T10:20:50Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

hnyls2002 · 2026-04-05T10:20:53Z

/rerun-test test_reasoning.py

github-actions · 2026-04-05T10:21:18Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/reasoning/test_reasoning.py

hnyls2002 · 2026-04-05T10:22:16Z

/rerun-test test_reasoning.py test_eagle_infer_a.py test_eagle_constrained_decoding.py test_ngram_speculative_decoding.py

github-actions · 2026-04-05T10:22:44Z

✅ 1-gpu-h100 (4 tests): View workflow run

cd test/ && python3 registered/reasoning/test_reasoning.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py
cd test/ && python3 registered/spec/eagle/test_eagle_constrained_decoding.py
cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

…ject#22148)

hnyls2002 added 5 commits April 5, 2026 01:34

Restore deleted comments

2f8db8b

Remove duplicate comment in V1 block

7c6a4e5

keep the same with before

64a7664

hnyls2002 requested review from DarkSharpness, Ying1123, merrymercy and xiezhq-hermann as code owners April 5, 2026 09:05

gemini-code-assist Bot reviewed Apr 5, 2026

View reviewed changes

Base automatically changed from lsyin/isolate-spec-v1-post-processing to main April 5, 2026 10:16

Merge main and resolve conflict

f4332ef

github-actions Bot added the run-ci label Apr 5, 2026

hnyls2002 merged commit df9c831 into main Apr 5, 2026
81 of 134 checks passed

hnyls2002 deleted the lsyin/unify-think-end-id branch April 5, 2026 10:35

ShangmingCai mentioned this pull request Apr 5, 2026

Fix create_grammar_backend test calls with think_end_id #22158

Merged

5 tasks

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

Unify think_end_id to model_config as single source of truth (sgl-pro…

7680581

…ject#22148)

Fridge003 pushed a commit that referenced this pull request Apr 7, 2026

Unify think_end_id to model_config as single source of truth (#22148)

4414264

xiezhq-hermann pushed a commit to antgroup/sglang that referenced this pull request Apr 7, 2026

Unify think_end_id to model_config as single source of truth (sgl-pro…

7985e01

…ject#22148)

This was referenced Apr 12, 2026

[Optimization] Strip thinking tokens from radix cache for reasoning models #22617

Closed

Reasoning model thinking tokens pollute radix cache with unreachable entries #22373

Closed

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

Unify think_end_id to model_config as single source of truth (sgl-pro…

bb3fdbd

…ject#22148)

hnyls2002 mentioned this pull request Apr 29, 2026

Deepseek V4 #23882

Merged

Conversation

hnyls2002 commented Apr 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

hnyls2002 commented Apr 5, 2026

Uh oh!

github-actions Bot commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant