Skip to content

nick's streaming changes 2#2

Merged
joshuadeng merged 6 commits intojoshuadeng:streaming_supportfrom
njhill:streaming_support_nick2
Jan 24, 2026
Merged

nick's streaming changes 2#2
joshuadeng merged 6 commits intojoshuadeng:streaming_supportfrom
njhill:streaming_support_nick2

Conversation

@njhill
Copy link
Copy Markdown

@njhill njhill commented Jan 23, 2026

Re-based continuation of #1.

Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
and don't support prompt_embeds with input streaming for now

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill changed the base branch from main to streaming_support January 23, 2026 23:02
@njhill njhill mentioned this pull request Jan 23, 2026
5 tasks
Comment on lines +874 to +877
if update is None:
# Streaming-input request finished.
self.finish_requests(session.request_id, RequestStatus.FINISHED_ABORTED)
return
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think update can be None from handle_stopped and add_request callsites

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want to add this guardrail, we should make the update input type optional

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it can be, this is the case where a final sentinel request is sent with resumable = False: https://github.com/joshuadeng/vllm/pull/2/files#diff-35f85e99eae8897d78a45f6a8d21bb69f9d8fe4a51e072bf299118dadac612f3R48-R49.

But the type hint is wrong and maybe it would be clearer to handle this case outside since it only applies to one of the callsites, will push a small update for that rn.

Comment on lines +881 to +889
num_computed_tokens = session.num_computed_tokens
kept_output_tokens = session._all_token_ids[
session.num_prompt_tokens : num_computed_tokens
]
del session._all_token_ids[num_computed_tokens:]
session._output_token_ids.clear()
assert session.prompt_token_ids is not None
# Extend prompt with kept output tokens.
session.prompt_token_ids.extend(kept_output_tokens)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@joshuadeng joshuadeng merged commit a75aea5 into joshuadeng:streaming_support Jan 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants