-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[P/D] NIXL Updates #25844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+11
−3
Merged
[P/D] NIXL Updates #25844
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
bb79c4d
Reduce the Cuda Graph memory footprint when running with DBO (#25779)
SageMoore ee10d7e
Validate API tokens in constant time (#25781)
russellb 04c2b26
Add filtering for chat template kwargs (#25794)
russellb 32335c8
Add option to restrict media domains (#25783)
russellb c2fa2d4
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL (#25788)
yewentao256 5aa5811
[CI] Fix FlashInfer AOT in release docker image (#25730)
mgoin 26a7a33
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
tlrmchlsmth b14773b
[Bugfix][NIXL] Fix Async Scheduler timeout issue (#25808)
NickLucche 6de3d43
[MM] Optimize memory profiling for scattered multimodal embeddings (#…
ywang96 19e7ab7
[Bugfix] Fix Qwen3-VL regression from #24982 (#25814)
ywang96 4c34704
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurab…
Isotr0py 6fc2f1c
updated
87d6b8b
updated
a46e572
merged
6ec0c55
updated
74dc751
updated
4181ef7
updated
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you unit test this scenario?
In #25067 the case I tested was an abort after the prefill request had finished, but @NickLucche rightly asked (AIUI):
Whatever the scenario is ... if it is supposed to happen, we shouldn't have a warning that is not actionable by the user. But a "why this is supposed to happen" comment would be important for maintainability. Something similar to
vllm/vllm/v1/core/sched/scheduler.py
Lines 888 to 893 in 8616300
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be a debug log.
I will post up a diff later today that:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on this in #25067 again