Skip to content

[BugFix] Async scheduling: handle model forward errors more cleanly#31611

Merged
vllm-bot merged 1 commit intovllm-project:mainfrom
njhill:async-exec-errs
Jan 4, 2026
Merged

[BugFix] Async scheduling: handle model forward errors more cleanly#31611
vllm-bot merged 1 commit intovllm-project:mainfrom
njhill:async-exec-errs

Conversation

@njhill
Copy link
Copy Markdown
Member

@njhill njhill commented Jan 2, 2026

If the model runner execute_model() method raises an exception, it will be logged via a future callback, but the core loop will subsequently fail with a misleading secondary exception since sample_tokens() will return None:

`AttributeError: 'NoneType' object has no attribute 'sampled_token_ids'`

This PR changes the step_with_batch_queue() method in the core loop to instead raise the root cause exception from the execute_model() future inline in this case, which also removes the need for the error callback.

If the model runner execute_model() raises an exception, it will be logged via a future callback, but the core loop will subsequently fail with a misleading exception since sample_tokens() will return None:

`AttributeError: 'NoneType' object has no attribute 'sampled_token_ids'`

This PR changes the step_with_batch_queue() method to instead raise the root cause exception from the execute_model() future inline in this case, which also removes the need for the error callback.

Signed-off-by: njhill <nickhill123@gmail.com>
@mergify mergify bot added the v1 label Jan 2, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the error handling in asynchronous scheduling to propagate the root cause exception from execute_model() failures, which is a good improvement. However, the implementation introduces a critical bug where a successful execution path can lead to a RuntimeError. My review includes a comment with a suggested fix for this issue.

@njhill njhill marked this pull request as ready for review January 2, 2026 00:49
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 2, 2026
@njhill njhill changed the title [Core] Async scheduling: handle model forward errors more cleanly [BugFix] Async scheduling: handle model forward errors more cleanly Jan 2, 2026
ohsono added a commit to ohsono/vllm that referenced this pull request Jan 2, 2026
Signed-off-by: Hochan Son <ohsono@gmail.com>
@njhill
Copy link
Copy Markdown
Member Author

njhill commented Jan 2, 2026

CI failure is unrelated

@vllm-bot vllm-bot merged commit b53b89f into vllm-project:main Jan 4, 2026
48 of 50 checks passed
@njhill njhill deleted the async-exec-errs branch January 4, 2026 19:35
LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…llm-project#31611)

Signed-off-by: njhill <nickhill123@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
ohsono added a commit to ohsono/vllm that referenced this pull request Feb 25, 2026
Signed-off-by: Hochan Son <ohsono@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants