EVS Support (Video tokens pruning)#22980
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
This pull request adds support for Efficient Video Sampling (EVS) by introducing a new interface for models to return custom embeddings and positions, which enables video token pruning. While the overall direction is good, there are several critical issues that need to be addressed. The new interface signature in interfaces.py is inconsistent with its usage in gpu_model_runner.py. More importantly, the logic for updating request states with the pruned positions appears to be incorrect, as it applies the same update to all requests in a batch. Additionally, there are several leftover debugging statements and commented-out code that should be cleaned up before merging.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
3ad3321 to
5e784b0
Compare
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…e/evs-support-clean
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> # Conflicts: # vllm/v1/worker/gpu_model_runner.py
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
…e/evs-support-clean
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
DarkLight1337
left a comment
There was a problem hiding this comment.
Sorry for the delay, let's get this merged
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
Enable use of Efficient Video Sampling (EVS) for redundant video tokens pruning:
EVS reduces TTFT and ITL by pruning less important vision tokens from the LLM:
Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.