[Refactor] [1/N] to simplify the vLLM serving architecture#28040
[Refactor] [1/N] to simplify the vLLM serving architecture#28040vllm-bot merged 14 commits intovllm-project:mainfrom
Conversation
|
For reference, from @DarkLight1337 in #27882
|
|
@chaunceyjiang @markmc @DarkLight1337 Just opened RFC #28218 , might be related to this refactor. Would love to hear your thoughts! |
ab084a9 to
014ece9
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the vLLM serving architecture by splitting the monolithic vllm/entrypoints/openai/api_server.py into smaller, more focused modules under vllm/entrypoints/serve/. This is a good improvement for maintainability. I've found two issues: one critical regression where some refactored API endpoints are not registered, and one high-severity issue where an API response type has changed, which could be a breaking change for clients.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the vLLM serving architecture by splitting the monolithic api_server.py into a more modular structure with functionality-specific routers. This greatly improves the maintainability and readability of the code. The code has been moved logically into new modules under vllm/entrypoints/serve/. While reviewing the changes, I identified a critical issue with state synchronization for the elastic endpoint scaling feature that could cause problems in a multi-worker production environment. My detailed feedback is in the review comment.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
83a5ddc to
e4e5489
Compare
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…ect#28040) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Purpose
Refer #28218
Split
vllm/entrypoints/openai/api_server.pybased on different functionalities.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.