-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Torch Profiling with DeepEP PD Disaggregation #5462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
liz-badada
wants to merge
11
commits into
sgl-project:main
from
liz-badada:PD_Dissaggregation_Torch_Profiling
Closed
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
7d78dec
support torch profiling with pd dissaggregation
liz-badada 534e924
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada ca39efa
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada 8f6a0a8
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada 9199968
code style
jychen21 fa80abe
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada a23eaf8
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada c5f2b9d
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada e91a804
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
liz-badada d1ee558
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
jychen21 4430304
Merge branch 'main' into PD_Dissaggregation_Torch_Profiling
jychen21 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,7 +5,9 @@ | |
| import asyncio | ||
| import dataclasses | ||
| import logging | ||
| import os | ||
| import random | ||
| import time | ||
| import urllib | ||
| from itertools import chain | ||
| from typing import List, Optional | ||
|
|
@@ -49,6 +51,10 @@ def __init__(self, prefill_configs: List[PrefillConfig], decode_servers: List[st | |
| self.prefill_configs = prefill_configs | ||
| self.prefill_servers = [p.url for p in prefill_configs] | ||
| self.decode_servers = decode_servers | ||
| self.profiling = False | ||
|
|
||
| profile_dir = os.getenv("SGLANG_TORCH_PROFILER_DIR", "./tmp") | ||
| os.makedirs(profile_dir, exist_ok=True) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Creating |
||
|
|
||
| def select_pair(self): | ||
| # TODO: return some message instead of panic | ||
|
|
@@ -59,6 +65,46 @@ def select_pair(self): | |
| decode_server = random.choice(self.decode_servers) | ||
| return prefill_config.url, prefill_config.bootstrap_port, decode_server | ||
|
|
||
| async def start_profile(self): | ||
| """Start profiling on all servers.""" | ||
| if self.profiling: | ||
| return {"success": False, "message": "Profiling is already in progress"} | ||
|
|
||
| self.profiling = True | ||
| async with aiohttp.ClientSession() as session: | ||
| tasks = [] | ||
| for server in chain(self.prefill_servers, self.decode_servers): | ||
| tasks.append(session.post(f"{server}/start_profile")) | ||
|
|
||
| responses = await asyncio.gather(*tasks) | ||
| success = all(response.status == 200 for response in responses) | ||
| return { | ||
| "success": success, | ||
| "message": ( | ||
| "Profiling started" if success else "Failed to start profiling" | ||
| ), | ||
| } | ||
|
|
||
| async def stop_profile(self): | ||
| """Stop profiling on all servers.""" | ||
| if not self.profiling: | ||
| return {"success": False, "message": "Profiling is not in progress"} | ||
|
|
||
| self.profiling = False | ||
| async with aiohttp.ClientSession() as session: | ||
| tasks = [] | ||
| for server in chain(self.prefill_servers, self.decode_servers): | ||
| tasks.append(session.post(f"{server}/stop_profile")) | ||
|
|
||
| responses = await asyncio.gather(*tasks) | ||
| success = all(response.status == 200 for response in responses) | ||
| return { | ||
| "success": success, | ||
| "message": ( | ||
| "Profiling stopped" if success else "Failed to stop profiling" | ||
| ), | ||
| } | ||
|
|
||
| async def generate( | ||
| self, modified_request, prefill_server, decode_server, endpoint | ||
| ) -> ORJSONResponse: | ||
|
|
@@ -321,6 +367,22 @@ async def register(obj: PDRegistryRequest): | |
| return Response(status_code=200) | ||
|
|
||
|
|
||
| @app.post("/start_profile") | ||
| async def start_profile(): | ||
| """Start profiling on all servers.""" | ||
| if load_balancer is None: | ||
| raise HTTPException(status_code=500, detail="Load balancer not initialized") | ||
| return await load_balancer.start_profile() | ||
|
|
||
|
|
||
| @app.post("/stop_profile") | ||
| async def stop_profile(): | ||
| """Stop profiling on all servers.""" | ||
| if load_balancer is None: | ||
| raise HTTPException(status_code=500, detail="Load balancer not initialized") | ||
| return await load_balancer.stop_profile() | ||
|
|
||
|
|
||
| def run(prefill_configs, decode_addrs, host, port): | ||
| global load_balancer | ||
| load_balancer = MiniLoadBalancer(prefill_configs, decode_addrs) | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is changing the output directory necessary?