Skip to content

[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption#31955

Closed
izhuhaoran wants to merge 2 commits intovllm-project:mainfrom
izhuhaoran:fix-struct-preempt
Closed

[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption#31955
izhuhaoran wants to merge 2 commits intovllm-project:mainfrom
izhuhaoran:fix-struct-preempt

Conversation

@izhuhaoran
Copy link
Contributor

@izhuhaoran izhuhaoran commented Jan 8, 2026

Purpose

Fixes #31876

The unit test v1/e2e/test_async_scheduling.py::test_with_spec_decoding fails intermittently. The failure occurs with the following configuration:

test_sampling_params = [
    dict(structured_outputs=struct_outputs),
]

# test_preemption, executor, async_scheduling,
# spec_config, test_prefill_chunking
test_configs = [
    (True, "uni", False, spec_config_short, True),
]

With additional debug logs, we observed that request 5-849fcee3 triggered a grammar AssertionError after being preempted and subsequently resumed:

INFO 01-08 23:05:22 [scheduler.py:782] Request 5-849fcee3 is preempted, its spec_token_ids=[330]
··· some steps later ···
INFO 01-08 23:05:23 [scheduler.py:634] Resuming preempted request 5-849fcee3, its spec_token_ids=[330]
INFO 01-08 23:05:23 [scheduler.py:727] scheduler.schedule: scheduled_spec_decode_tokens={}
··· next step ···
INFO 01-08 23:05:23 [scheduler.py:727] scheduler.schedule: scheduled_spec_decode_tokens={'5-849fcee3': [330]}
ERROR 01-08 23:05:23 [backend_xgrammar.py:180] Failed to advance FSM for request 5-849fcee3 for tokens 330. Please file an issue.
ERROR 01-08 23:05:23 [core.py:902] EngineCore encountered a fatal error.
ERROR 01-08 23:05:23 [core.py:902] Traceback (most recent call last):
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 893, in run_engine_core
ERROR 01-08 23:05:23 [core.py:902]     engine_core.run_busy_loop()
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 920, in run_busy_loop
ERROR 01-08 23:05:23 [core.py:902]     self._process_engine_step()
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 953, in _process_engine_step
ERROR 01-08 23:05:23 [core.py:902]     outputs, model_executed = self.step_fn()
ERROR 01-08 23:05:23 [core.py:902]                               ^^^^^^^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/engine/core.py", line 353, in step
ERROR 01-08 23:05:23 [core.py:902]     grammar_output = self.scheduler.get_grammar_bitmask(scheduler_output)
ERROR 01-08 23:05:23 [core.py:902]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/core/sched/scheduler.py", line 1046, in get_grammar_bitmask
ERROR 01-08 23:05:23 [core.py:902]     bitmask = self.structured_output_manager.grammar_bitmask(
ERROR 01-08 23:05:23 [core.py:902]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902]   File "/mnt/debugger/zhr/vllm_official/vllm/v1/structured_output/__init__.py", line 271, in grammar_bitmask
ERROR 01-08 23:05:23 [core.py:902]     assert accepted, (token, req_id, scheduled_spec_decode_tokens)
ERROR 01-08 23:05:23 [core.py:902]            ^^^^^^^^
ERROR 01-08 23:05:23 [core.py:902] AssertionError: (330, '5-849fcee3', {'5-849fcee3': [330]})

Root Cause

When a request is preempted, spec_token_ids was not cleared, leading to stale tokens being used after resumption:

Step N:   Request in running queue, generates draft tokens
Step N+1: Request is preempted, spec_token_ids = [A, B] (not cleared)
Step N+2: Request resumes from waiting → running (this is prefill!)
          - Prefill does NOT generate draft tokens
          - update_draft_token_ids skips this request
          - spec_token_ids remains [A, B]
Step N+3: Request in running queue (now decode)
          - Scheduler uses stale spec_token_ids [A, B]
          - get_grammar_bitmask() tries to accept outdated tokens
          - AssertionError!

Fix

Clear spec_token_ids for preempted requests in update_from_output()

Verification

The issue is resolved after applying this patch; repeated multiple times, the unit tests show no recurrence.

… prevent conflicts

Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此拉取请求解决了一个关键错误:在使用结构化输出时,被抢占的请求在恢复时可能导致致命错误。问题根源在于抢占期间未重置语法状态和推测性令牌(speculative token)状态,导致请求重新调度时出现状态不一致。

所提出的修复方案通过在 _preempt_request 中添加逻辑来正确解决此问题:

  1. 如果请求使用结构化输出,则重置语法的有限状态机(grammar.reset())。
  2. 清除所有已存在的推测性令牌 ID(request.spec_token_ids = [])。

这些更改确保了被抢占的请求在返回等待队列时处于干净状态,从而防止了恢复时的状态冲突。此修复目标明确,实现简洁且正确。我没有其他意见。

@izhuhaoran
Copy link
Contributor Author

@robertgshaw2-redhat @njhill could you please review this PR when you have time ?

@izhuhaoran izhuhaoran marked this pull request as draft January 8, 2026 10:25
@izhuhaoran
Copy link
Contributor Author

Currently, this patch cause other UT case "Failed to advance FSM", and I'm working on this

Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
@izhuhaoran izhuhaoran changed the title [BugFix] Reset grammar state to prevent conflicts when request is resumed [BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption Jan 8, 2026
@izhuhaoran izhuhaoran marked this pull request as ready for review January 8, 2026 15:38
@izhuhaoran
Copy link
Contributor Author

Currently, this patch cause other UT case "Failed to advance FSM", and I'm working on this

Now, this PR is ready.

@izhuhaoran
Copy link
Contributor Author

same solved in #31944, close this PR

@izhuhaoran izhuhaoran closed this Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI Failure]: backend_xgrammar.py: Failed to advance FSM for request

1 participant