-
-
Notifications
You must be signed in to change notification settings - Fork 10k
Description
Motivation.
vLLM's V1 architecture has been in production for a while now, and we have seen good results with the transition from V0 to V1. We are now in progress of deprecating the V0 code path and removing the technical debt from vLLM.
As we complete this effort, it is now time to begin planning for the 1.0 release which will occur sometime this fall.
With the V1 engine update focused on refactoring and eliminating the technical debt in vLLM's internal subsystems, with the major version update for 1.0, we will now take the opportunity to clean up vLLM's external interfaces and adhere to a semantic versioning scheme.
This is increasingly important as more interfaces are becoming pluggable in vLLM V1
- V0 Deprecation Issue: [RFC]: Deprecating vLLM V0 #18571
Proposed Change.
We are developing a work plan towards this 1.0 release.
Major items we have targeted are:
- Stabilize and document the existing Pluggable Interfaces (
Scheduler
,KVConnector
,ModelRunner
,Model
) - Clarify and document which subsystems are intended to be pluggable (public apis) vs not (private apis)
- Stabilize and document
AsyncLLM
andLLM
interfaces - what is public vs private - Redo and simplify the vLLM configuration setup (EngineArgs, Env variables)
- Redo and simplify the vLLM logging setup
- Simplify vLLM Prometheus metrics
- Overhaul docs and recipes
- Overhaul performance tracking system
- Website
Feedback Period.
End of August
CC List.
@simon-mo @WoosukKwon @youkaichao @njhill @tlrmchlsmth @mgoin @ywang96 @yeqcharlotte @houseroad
Any Other Things.
What else are we missing
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.