[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption by izhuhaoran · Pull Request #31955 · vllm-project/vllm

izhuhaoran · 2026-01-08T08:51:37Z

Purpose

The unit test v1/e2e/test_async_scheduling.py::test_with_spec_decoding fails intermittently. The failure occurs with the following configuration:

test_sampling_params = [
    dict(structured_outputs=struct_outputs),
]

# test_preemption, executor, async_scheduling,
# spec_config, test_prefill_chunking
test_configs = [
    (True, "uni", False, spec_config_short, True),
]

With additional debug logs, we observed that request 5-849fcee3 triggered a grammar AssertionError after being preempted and subsequently resumed:

INFO 01-08 23:05:22 [scheduler.py:782] Request 5-849fcee3 is preempted, its spec_token_ids=[330]
··· some steps later ···
INFO 01-08 23:05:23 [scheduler.py:634] Resuming preempted request 5-849fcee3, its spec_token_ids=[330]
INFO 01-08 23:05:23 [scheduler.py:727] scheduler.schedule: scheduled_spec_decode_tokens={}
··· next step ···
INFO 01-08 23:05:23 [scheduler.py:727] scheduler.schedule: scheduled_spec_decode_tokens={'5-849fcee3': [330]}
ERROR 01-08 23:05:23 [backend_xgrammar.py:180] Failed to advance FSM for request 5-849fcee3 for tokens 330. Please file an issue.
ERROR 01-08 23:05:23 [core.py:902] EngineCore encountered a fatal error.
ERROR 01-08 23:05:23 [core.py:902] Traceback (most recent call last):
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 893, in run_engine_core
ERROR 01-08 23:05:23 [core.py:902]     engine_core.run_busy_loop()
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 920, in run_busy_loop
ERROR 01-08 23:05:23 [core.py:902]     self._process_engine_step()
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 953, in _process_engine_step
ERROR 01-08 23:05:23 [core.py:902]     outputs, model_executed = self.step_fn()
ERROR 01-08 23:05:23 [core.py:902]                               ^^^^^^^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 353, in step
ERROR 01-08 23:05:23 [core.py:902]     grammar_output = self.scheduler.get_grammar_bitmask(scheduler_output)
ERROR 01-08 23:05:23 [core.py:902]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/core/sched/scheduler.py", line 1046, in get_grammar_bitmask
ERROR 01-08 23:05:23 [core.py:902]     bitmask = self.structured_output_manager.grammar_bitmask(
ERROR 01-08 23:05:23 [core.py:902]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/structured_output/__init__.py", line 271, in grammar_bitmask
ERROR 01-08 23:05:23 [core.py:902]     assert accepted, (token, req_id, scheduled_spec_decode_tokens)
ERROR 01-08 23:05:23 [core.py:902]            ^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902] AssertionError: (330, '5-849fcee3', {'5-849fcee3': [330]})

Root Cause

When a request is preempted, spec_token_ids was not cleared, leading to stale tokens being used after resumption:

Step N:   Request in running queue, generates draft tokens
Step N+1: Request is preempted, spec_token_ids = [A, B] (not cleared)
Step N+2: Request resumes from waiting → running (this is prefill!)
          - Prefill does NOT generate draft tokens
          - update_draft_token_ids skips this request
          - spec_token_ids remains [A, B]
Step N+3: Request in running queue (now decode)
          - Scheduler uses stale spec_token_ids [A, B]
          - get_grammar_bitmask() tries to accept outdated tokens
          - AssertionError!

Fix

Clear spec_token_ids for preempted requests in update_from_output()

Verification

The issue is resolved after applying this patch; repeated multiple times, the unit tests show no recurrence.

… prevent conflicts Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

gemini-code-assist

Code Review

此拉取请求解决了一个关键错误：在使用结构化输出时，被抢占的请求在恢复时可能导致致命错误。问题根源在于抢占期间未重置语法状态和推测性令牌（speculative token）状态，导致请求重新调度时出现状态不一致。

所提出的修复方案通过在 _preempt_request 中添加逻辑来正确解决此问题：

如果请求使用结构化输出，则重置语法的有限状态机（grammar.reset()）。
清除所有已存在的推测性令牌 ID（request.spec_token_ids = []）。

这些更改确保了被抢占的请求在返回等待队列时处于干净状态，从而防止了恢复时的状态冲突。此修复目标明确，实现简洁且正确。我没有其他意见。

izhuhaoran · 2026-01-08T09:40:28Z

@robertgshaw2-redhat @njhill could you please review this PR when you have time ?

izhuhaoran · 2026-01-08T10:27:08Z

Currently, this patch cause other UT case "Failed to advance FSM", and I'm working on this

Signed-off-by: izhuhaoran <izhuhaoran@qq.com>

izhuhaoran · 2026-01-08T15:48:22Z

Currently, this patch cause other UT case "Failed to advance FSM", and I'm working on this

Now, this PR is ready.

izhuhaoran · 2026-01-08T16:20:26Z

same solved in #31944, close this PR

[Fix] Reset grammar state and spec token IDs on request resumption to…

361d992

… prevent conflicts Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>

izhuhaoran requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, robertgshaw2-redhat and ywang96 as code owners January 8, 2026 08:51

mergify bot added the v1 label Jan 8, 2026

gemini-code-assist bot reviewed Jan 8, 2026

View reviewed changes

izhuhaoran mentioned this pull request Jan 8, 2026

[CI Failure]: backend_xgrammar.py: Failed to advance FSM for request #31876

Closed

3 tasks

izhuhaoran marked this pull request as draft January 8, 2026 10:25

only reset spec_token_ids in update_from_output

fcf3225

Signed-off-by: izhuhaoran <izhuhaoran@qq.com>

izhuhaoran changed the title ~~[BugFix] Reset grammar state to prevent conflicts when request is resumed~~ [BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption Jan 8, 2026

izhuhaoran marked this pull request as ready for review January 8, 2026 15:38

izhuhaoran closed this Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption#31955

[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption#31955
izhuhaoran wants to merge 2 commits intovllm-project:mainfrom
izhuhaoran:fix-struct-preempt

izhuhaoran commented Jan 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

izhuhaoran commented Jan 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Root Cause

Fix

Verification

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

izhuhaoran commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

izhuhaoran commented Jan 8, 2026 •

edited by github-actions bot

Loading