[V0 Deprecation] Drop V0 encoder-decoder runner#23300
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively removes the generic encoder-decoder model runner and its associated utilities, tests, and documentation. The changes are consistent with the goal of dropping this functionality. My review focuses on ensuring the removal is clean and complete. I've identified one area where there might be leftover code that could be removed to improve maintainability.
There was a problem hiding this comment.
With the removal of the generic encoder-decoder runner, the is_encoder_decoder property on ModelConfig appears to be obsolete. Its primary user in vllm/worker/worker.py has been removed.
This test now only checks that two decoder-only models are not encoder-decoder models, which provides limited value.
If ModelConfig.is_encoder_decoder is no longer used throughout the codebase, it should be removed from ModelConfig to avoid confusion and dead code. Consequently, this test should also be removed. If the property is still used by specialized models (e.g., Whisper), this test should be updated to include positive test cases for those models to ensure its functionality is still correctly verified.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
cef656e to
8b7946d
Compare
|
This is one of the last use cases which V1 still doesn't support, so I think we should drop this last |
|
@DarkLight1337 We don't plan to continue support these models. |
|
#21088 should enable it for Whisper at least, it is getting close to being ready |
|
@DarkLight1337 Oh right. Whisper is the only exception. I can wait until the PR is merged. |
bart will be a pretty small addition on top, i think. Maybe wait to see how it looks? I was going to look at it next. |
|
With more progress on the Transformers backend side we may be able to run the encoder in Transformers (Transformers can handle any encoder caching) and then the decoder in vLLM (via Transformers backend) where we pass the embeddings from the encoder directly into the decoder in vLLM. I can't say for certain exactly how this will work as there's a massive ongoing refactor to allow us to swap out the attention module for the vLLM attention module in encoder-decoder models. Unfortunately I don't have a timeline for when this will be ready. |
|
This pull request has merge conflicts that must be resolved before it can be |
|
@russellb please don't do it. We will not support any encoder decoder model besides Whisper. Whisper is the ONLY exception. |
Got it - was just looking to close gaps for things I know people are using (unfortunately) |
NickLucche
left a comment
There was a problem hiding this comment.
I think @russellb 's PR is still relying on a bunch of these tests to ensure correctness.
Can we factor them out to make sure the eventual whisper addition still has reasonable test coverage?
|
@WoosukKwon would you like me to pick this up? |
|
Looks like this work was moved to a new PR (already merged) #24907 |
Summary
Testing
pre-commit run --files docs/models/supported_models.md tests/core/block/test_block_manager.py tests/models/registry.py tests/models/test_registry.py tests/test_config.py tests/test_inputs.py vllm/model_executor/models/registry.py vllm/worker/worker.py(failed: command not found)pip install pre-commit(failed: Tunnel connection failed: 403 Forbidden)pytest tests/test_inputs.py::test_parse_single_batch_empty -q(failed: ModuleNotFoundError: No module named 'torch')https://chatgpt.com/codex/tasks/task_b_68a521482e3c832db72d14979dc1ff68