[Merge]PR-25233 From Original vLLM repo by fake0fan#97
[Merge]PR-25233 From Original vLLM repo by fake0fan#97kechengliu97 wants to merge 1 commit intoJiusiServe:mainfrom kechengliu97:main
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces a disaggregated encoder architecture, a significant new feature allowing the vision encoder to run separately from the prefill/decode stages. This is a well-structured and comprehensive change, including new abstractions (ECConnector), modifications to the scheduler and model runner, extensive tests, and clear documentation. My review focuses on a few issues found in the example scripts that demonstrate this new feature.
| --num-prompts $NUM_PROMPTS \ | ||
| --port $PROXY_PORT | ||
|
|
||
| PIDS+=($!) |
There was a problem hiding this comment.
| --num-prompts $NUM_PROMPTS \ | ||
| --port $PROXY_PORT | ||
|
|
||
| PIDS+=($!) |
There was a problem hiding this comment.
| async def healthy(urls): | ||
| if not urls: | ||
| return "empty" | ||
| for u in urls: | ||
| try: | ||
| async with encode_session.get(f"{u}/health") as resp: | ||
| resp.raise_for_status() | ||
| except Exception: | ||
| return "unhealthy" | ||
| return "healthy" |
There was a problem hiding this comment.
The healthy function incorrectly uses encode_session to check the health of all clusters (encode, prefill, and decode). It should use the appropriate session for each cluster (prefill_session for prefill, decode_session for decode). This can lead to incorrect health status reporting.
The healthy function should be modified to accept a session object as an argument. Then, at the call site on line 348, you should pass the correct session for each cluster.
async def healthy(urls, session):
if not urls:
return "empty"
if not session:
# This case happens for prefill when it's disabled.
return "empty"
for u in urls:
try:
async with session.get(f"{u}/health") as resp:
resp.raise_for_status()
except Exception:
return "unhealthy"
return "healthy"…enImage (JiusiServe#97) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: SamitHuang <285365963@qq.com>
Uh oh!
There was an error while loading. Please reload this page.