[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption#31955
Closed
izhuhaoran wants to merge 2 commits intovllm-project:mainfrom
Closed
[BugFix] Clear spec_token_ids for preempted req to prevent grammar conflicts on resumption#31955izhuhaoran wants to merge 2 commits intovllm-project:mainfrom
izhuhaoran wants to merge 2 commits intovllm-project:mainfrom
Conversation
… prevent conflicts Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>
Contributor
There was a problem hiding this comment.
Code Review
此拉取请求解决了一个关键错误:在使用结构化输出时,被抢占的请求在恢复时可能导致致命错误。问题根源在于抢占期间未重置语法状态和推测性令牌(speculative token)状态,导致请求重新调度时出现状态不一致。
所提出的修复方案通过在 _preempt_request 中添加逻辑来正确解决此问题:
- 如果请求使用结构化输出,则重置语法的有限状态机(
grammar.reset())。 - 清除所有已存在的推测性令牌 ID(
request.spec_token_ids = [])。
这些更改确保了被抢占的请求在返回等待队列时处于干净状态,从而防止了恢复时的状态冲突。此修复目标明确,实现简洁且正确。我没有其他意见。
3 tasks
Contributor
Author
|
@robertgshaw2-redhat @njhill could you please review this PR when you have time ? |
Contributor
Author
|
Currently, this patch cause other UT case "Failed to advance FSM", and I'm working on this |
Signed-off-by: izhuhaoran <izhuhaoran@qq.com>
Contributor
Author
Now, this PR is ready. |
Contributor
Author
|
same solved in #31944, close this PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Fixes #31876
The unit test
v1/e2e/test_async_scheduling.py::test_with_spec_decodingfails intermittently. The failure occurs with the following configuration:With additional debug logs, we observed that request
5-849fcee3triggered a grammar AssertionError after being preempted and subsequently resumed:Root Cause
When a request is preempted,
spec_token_idswas not cleared, leading to stale tokens being used after resumption:Fix
Clear
spec_token_idsfor preempted requests inupdate_from_output()Verification
The issue is resolved after applying this patch; repeated multiple times, the unit tests show no recurrence.