Skip to content

[CI failure] Revert "[BugFix] Fix the issue of thinker requests being preempted, causing shape mismatch."#3648

Closed
Gaohan123 wants to merge 1 commit into
mainfrom
revert-3147-bugfix
Closed

[CI failure] Revert "[BugFix] Fix the issue of thinker requests being preempted, causing shape mismatch."#3648
Gaohan123 wants to merge 1 commit into
mainfrom
revert-3147-bugfix

Conversation

@Gaohan123
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 added ready label to trigger buildkite CI merge-test label to trigger buildkite merge test CI labels May 15, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 145dc177b4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 120 to 122
task = {
"pooling_output": pooling_output,
"request": request,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore preemption guard before enqueuing async chunks

save_async now unconditionally enqueues every chunk task, so when async-chunk requests are preempted and request.num_computed_tokens rolls back, stale chunks can be sent again while put_req_chunk continues forward. Those duplicate/stale payloads are then merged on the receiver side via tensor/list concatenation, which can desynchronize thinker/talker sequence shapes and reintroduce the preemption shape-mismatch failure under normal preempt-and-resume scheduling.

Useful? React with 👍 / 👎.

Comment on lines +999 to +1000
if start_index >= len(thinker_output_token_ids) - 1:
# When the tokens output by the thinker are exhausted, an EOS token needs to be appended.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid emitting EOS before the final decode embedding

The new exhaustion check uses start_index >= len(ids.output) - 1, which marks decode as exhausted one step early. Because decode handoff starts with start_index=1, chunks with only 1–2 thinker output tokens (a common early/short-response case) immediately emit EOS/pad and skip projecting actual decode embeddings, leading to truncated or empty spoken output.

Useful? React with 👍 / 👎.

@amy-why-3459
Copy link
Copy Markdown
Contributor

Please wait.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

check #3650

@hsliuustc0106 hsliuustc0106 added the duplicate This issue or pull request already exists label May 15, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

#3650 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists merge-test label to trigger buildkite merge test CI ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants