-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[API] Add APIs for online profiling of diffusion models #1451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,7 +20,7 @@ | |
| import httpx | ||
| import vllm.envs as envs | ||
| from fastapi import APIRouter, Depends, File, Form, HTTPException, Request, UploadFile | ||
| from fastapi.responses import JSONResponse, StreamingResponse | ||
| from fastapi.responses import JSONResponse, Response, StreamingResponse | ||
| from PIL import Image | ||
| from starlette.datastructures import State | ||
| from starlette.routing import Route | ||
|
|
@@ -929,6 +929,45 @@ async def show_available_models(raw_request: Request) -> JSONResponse: | |
| ) | ||
|
|
||
|
|
||
| # Profiling API endpoints | ||
| def _get_engine_client(raw_request: Request) -> AsyncOmni: | ||
| engine_client = getattr(raw_request.app.state, "engine_client", None) | ||
| if engine_client is None: | ||
| raise HTTPException( | ||
| status_code=HTTPStatus.SERVICE_UNAVAILABLE.value, | ||
| detail="Engine not initialized.", | ||
| ) | ||
| return engine_client | ||
|
|
||
|
|
||
| @router.post("/start_profile") | ||
| async def start_profile(raw_request: Request): | ||
| """Start profiling for the running server. | ||
|
|
||
| Enables torch profiling to capture CPU/CUDA activities, memory allocation, | ||
| and other performance metrics. Use /stop_profile to stop and save the trace. | ||
| """ | ||
| logger.info("Starting profiler...") | ||
| engine_client = _get_engine_client(raw_request) | ||
| await engine_client.start_profile() | ||
| logger.info("Profiler started.") | ||
| return Response(status_code=200) | ||
|
Comment on lines
+952
to
+954
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The new Useful? React with 👍 / 👎. |
||
|
|
||
|
|
||
| @router.post("/stop_profile") | ||
| async def stop_profile(raw_request: Request): | ||
| """Stop profiling and save the trace. | ||
|
|
||
| Stops the profiler started by /start_profile and saves the trace file. | ||
| The trace location is determined by the VLLM_TORCH_PROFILER_DIR environment variable. | ||
| """ | ||
| logger.info("Stopping profiler...") | ||
| engine_client = _get_engine_client(raw_request) | ||
| await engine_client.stop_profile() | ||
| logger.info("Profiler stopped.") | ||
| return Response(status_code=200) | ||
|
|
||
|
|
||
| # Image generation API endpoints | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsyncOmniDiffusion.stop_profile()invokesDiffusionEngine.stop_profile()but does not return its result, so callers always receiveNone. In the new async stage path,handle_profiler_task_asyncdoesresult_data = stage_engine.stop_profile() or {}and forwards that to the orchestrator, which means profiling artifacts are always dropped from the response even when traces were successfully written. This breaks the new result-collection flow added in this commit.Useful? React with 👍 / 👎.